I've tried it now, but it made no difference. I've actually explained the problem wrong, here's what actually happens:
The 1st line in the PDF file is: 131 Comments are made from 1905, / See: Certain Neurotic Mechanisms in Where "131" is normal text, while the rest of the line has "Subscript" formatting. If I copy/paste the line from the PDF manually it copies it right ordered, but when extracting the text using PDFBox it extracts it like this: Comments are made from 1905, / See: Certain Neurotic Mechanisms in 131 The text is being read before the "131" number. Best regards, Hesham ---------------------------------------------------------------------------- ---------------------- Included Message: Am 17.11.20 um 07:54 schrieb Hesham Gneady: > Hi, > > > > I am trying to read this PDF file using > PDFTextStripper.processTextPosition(): > > <https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20> https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20 > readin > g%20sample.pdf?dl=0 > > > > But when I do that it reads it with wrong order. It reads the 2nd line > before the 1st line because the 1st line has Subscript effect. Is > there a way to read it right ordered? I a pdf the text doesn't neccessarly appear in the rendering order. You should give the sort option a try: org.apache.pdfbox.text.PDFTextStripper.setSortByPosition(boolean) Andreas --------------------------------------------------------------------- To unsubscribe, e-mail: <mailto:users-unsubscr...@pdfbox.apache.org> users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: <mailto:users-h...@pdfbox.apache.org> users-h...@pdfbox.apache.org