Hello, I am having problems extracting precise character-level text coordinates from PDF.
I have overridden PDFTextStripper's writeString(String text, List<TextPosition> textPositions) to access the text characters information This is a bit of code I use to extract info from the TextPosition fields and pass it to my CharacterTextPosition object. CharacterTextPosition characterTextPosition = new CharacterTextPosition(); characterTextPosition.SetCharacterText(textPosition.getUnicode()); characterTextPosition.SetLeft(textPosition.getXDirAdj()); characterTextPosition.SetBottom(pdPage.getMediaBox().getHeight() - textPosition.getYDirAdj()); characterTextPosition.SetWidth(textPosition.getWidthDirAdj()); characterTextPosition.SetHeight(textPosition.getHeight()); int characterDirection = (int) textPosition.getDir(); characterTextPosition.SetOrientation(characterDirection); This is a PDF where the extracted text coordinates for a PDF of a Powerpoint slide are drawn https://www.dropbox.com/s/hp1dape5mp2l8ti/PPT_Slide_CommonFont.pdf.text_extraction_rectangles.pdf?dl=0 as you can see the rectangles are smaller than the characters Fells this might have to do with the font ... I see textPosition.getFont() which returns the font information for the TextPosition ... Is there a way to adjust my code to get more accurate coordinates? Thanks a lot, Luca