On Sun Jan 23 10:02:08 PST 2022 rc...@pobox.com said: >I am using PDFBox's PDFTextStripper.getText() for a particular kind of >PDF file generated by a government agency, and the text I'm getting does >not match that displayed by Acrobat Reader for the same files. The >getText() calls occasionally get characters Reader does not display, and >in one case getText() gets an "O" instead of the "U" displayed by >Reader. I would like to know if there's some way I can get same text as >Reader displays.
Have you checked for embedded Fonts in the PDF? It's quite possible to have fonts where the code for "A" is NOT the save as the ASCII "A". -- Worlds only All Electric F-250 truck! http://john.casadelgato.com/Electric-Vehicles/1995-Ford-F-250
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org