Hi,
See the DrawPrintTextLocations.java example. There use the cyan colors,
these are the actual bounds. The red ones are heuristic.
Tilman
Am 18.02.2020 um 21:23 schrieb Luca Loiodice:
Hello,
I am having problems extracting precise character-level text coordinates
from PDF.
I have overridden PDFTextStripper's writeString(String text,
List<TextPosition> textPositions) to access the text characters information
This is a bit of code I use to extract info from the TextPosition fields
and pass it to my CharacterTextPosition object.
CharacterTextPosition characterTextPosition = new CharacterTextPosition();
characterTextPosition.SetCharacterText(textPosition.getUnicode());
characterTextPosition.SetLeft(textPosition.getXDirAdj());
characterTextPosition.SetBottom(pdPage.getMediaBox().getHeight() -
textPosition.getYDirAdj());
characterTextPosition.SetWidth(textPosition.getWidthDirAdj());
characterTextPosition.SetHeight(textPosition.getHeight());
int characterDirection = (int) textPosition.getDir();
characterTextPosition.SetOrientation(characterDirection);
This is a PDF where the extracted text coordinates for a PDF of a
Powerpoint slide are drawn
https://www.dropbox.com/s/hp1dape5mp2l8ti/PPT_Slide_CommonFont.pdf.text_extraction_rectangles.pdf?dl=0
as you can see the rectangles are smaller than the characters
Fells this might have to do with the font ... I see textPosition.getFont()
which returns the font information for the TextPosition ...
Is there a way to adjust my code to get more accurate coordinates?
Thanks a lot,
Luca
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org