thanks for the answer, problem is: how can I get the extracted characters and coordinates with pdfbox version 2.0.0.
The example reffers to an older version of pdfbox. Gesendet: Donnerstag, 31. März 2016 um 19:58 Uhr Von: "Tilman Hausherr" <[email protected]> An: [email protected] Betreff: Re: Extract Text of Document with coordinates Am 31.03.2016 um 12:51 schrieb Felix Hermann: > Hello, > > how can I extract the text + coordinates of a PDF document? > > To be more precise: I would like to extract all words of the document. And > for each word I need the coordinates of this word. > > If PDFBox does not support this: How can I get the coordinates of each > character? > > I tried to adapt the code of this example: > https://gist.github.com/DavidYKay/82f20ba67c50c499ebb3 Yes, the printtextlocations (or DrawPrintTextLocations) example is a good start. Look for the blanks and build words from there. Tilman > However, I was not successful, as I use the new PDFBox version. (2.0.0) > > Regards > > Felix > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

