Hi,
Am Sonntag, den 22.11.2020, 07:10 +0200 schrieb Hesham Gneady:
> I've tried it now, but it made no difference. I've actually explained
> the
> problem wrong, here's what actually happens:
> 
> The 1st line in the PDF file is:
> 
> 131 Comments are made from 1905, / See: Certain Neurotic Mechanisms
> in
> 
> Where "131" is normal text, while the rest of the line has
> "Subscript"
> formatting. If I copy/paste the line from the PDF manually it copies
> it
> right ordered, but when extracting the text using PDFBox it extracts
> it like
> this:
> 
> Comments are made from 1905, / See: Certain Neurotic Mechanisms in
> 131
> 
> The text is being read before the "131" number.


that's what I'm getting using the -sort option using PDFBox 2.0.21

131 Comments are made from 1905, / See: Certain Neurotic Mechanisms in 
Jealousy, Paranoia, and Homosexuality. (Internat. Journ. Psycho-
Analysis, vol. iv, 
April, 1923.) Freud, S. / A response to a mother’s concern about her
son’s 
homosexuality 1935 -Letters of Sigmund Freud. E. L. Freud (Ed.). New
York, NY: 
Basic Books. P 423. In this letter Freud links homosexuality to
‘arrested 
development.’
132 Allan Schore, Affect Regulation and the Origin of the self,
Lawrence Erlbaum 
1994. p 24

BR
Maruan


> 
>  
> 
>  
> 
> Best regards,
> 
> Hesham
> 
>  
> 
> ---------------------------------------------------------------------
> -------
> ----------------------
> 
> Included Message:
> 
>  
> 
> Am 17.11.20 um 07:54 schrieb Hesham Gneady:
> 
> > Hi,
> 
> > 
> 
> >   
> 
> > 
> 
> > I am trying to read this PDF file using
> 
> > PDFTextStripper.processTextPosition():
> 
> > 
> 
> >  <
> > https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20
> > >
> https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20
> 
> > readin
> 
> > g%20sample.pdf?dl=0
> 
> > 
> 
> >   
> 
> > 
> 
> > But when I do that it reads it with wrong order. It reads the 2nd
> > line 
> 
> > before the 1st line because the 1st line has Subscript effect. Is 
> 
> > there a way to read it right ordered?
> 
> I a pdf the text doesn't neccessarly appear in the rendering order.
> You
> should give the sort option a try:
> 
>  
> 
> org.apache.pdfbox.text.PDFTextStripper.setSortByPosition(boolean)
> 
>  
> 
>  
> 
> Andreas
> 
>  
> 
> ---------------------------------------------------------------------
> 
> To unsubscribe, e-mail:  <mailto:users-unsubscr...@pdfbox.apache.org>
> users-unsubscr...@pdfbox.apache.org
> 
> For additional commands, e-mail:  <mailto:
> users-h...@pdfbox.apache.org>
> users-h...@pdfbox.apache.org
> 
>  
> 

Reply via email to