Thanks Maruan!
Are you sure you're reading the PDF using PDFTextStripper.processTextPosition()? Because I've tried it again now using PDFBox 2.0.21 and I'm getting different results even when setting setSortByPosition( true ): Comments are made from 1905, / See: Certain Neurotic Mechanisms in 131Jealousy, Paranoia, and Homosexuality. (Internat. Journ. Psycho-Analysis, vol. iv, April, 1923.) Freud, S. / A response to a mother’s concern about her son’s homosexuality 1935 -Letters of Sigmund Freud. E. L. Freud (Ed.). New York, NY: Basic Books. P 423. In this letter Freud links homosexuality to ‘arrested development.’ Allan Schore, Affect Regulation and the Origin of the self, Lawrence Erlbaum 1321994. p 24 Best regards, Hesham -------------------------------------------------------------------------------------------------- Included Message: that's what I'm getting using the -sort option using PDFBox 2.0.21 131 Comments are made from 1905, / See: Certain Neurotic Mechanisms in Jealousy, Paranoia, and Homosexuality. (Internat. Journ. Psycho- Analysis, vol. iv, April, 1923.) Freud, S. / A response to a mother’s concern about her son’s homosexuality 1935 -Letters of Sigmund Freud. E. L. Freud (Ed.). New York, NY: Basic Books. P 423. In this letter Freud links homosexuality to ‘arrested development.’ 132 Allan Schore, Affect Regulation and the Origin of the self, Lawrence Erlbaum 1994. p 24 BR Maruan > > > > > > Best regards, > > Hesham > > > > --------------------------------------------------------------------- > ------- > ---------------------- > > Included Message: > > > > Am 17.11.20 um 07:54 schrieb Hesham Gneady: > > > Hi, > > > > > > > > > > > > I am trying to read this PDF file using > > > PDFTextStripper.processTextPosition(): > > > > > > < > > <https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%25> > > https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper% > > 20 > > > > <https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20> > https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20 > > > readin > > > g%20sample.pdf?dl=0 > > > > > > > > > > > > But when I do that it reads it with wrong order. It reads the 2nd > > line > > > before the 1st line because the 1st line has Subscript effect. Is > > > there a way to read it right ordered? > > I a pdf the text doesn't neccessarly appear in the rendering order. > You > should give the sort option a try: > > > > org.apache.pdfbox.text.PDFTextStripper.setSortByPosition(boolean) > > > > > > Andreas > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: < <mailto:users-unsubscr...@pdfbox.apache.org> > mailto:users-unsubscr...@pdfbox.apache.org> > <mailto:users-unsubscr...@pdfbox.apache.org> > users-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: <mailto: > <mailto:users-h...@pdfbox.apache.org> users-h...@pdfbox.apache.org> > <mailto:users-h...@pdfbox.apache.org> users-h...@pdfbox.apache.org > > >