Hi,

Am Samstag, dem 16.10.2021 um 16:57 +0530 schrieb Naganand Kanagal:
> Hi,
> 
> Noticed that PDFTextStripper.getText() returns the last line of a PDF
> document  as the first line in a text life. 

Text in the PDF doesn't necessarily appear in visual order. You can use
PDFTextStripper.setSortByPosition prior to getText() to extract text
closer to visual order.

BR
Maruan


> Since I am pattern searching in
> a document for "Name" and name happens to be the first line in these
> documents it really gets me the wrong information. Why does the last
> line
> become the first line? Is there a way to set this right?
> 
> 
> Logfile:
> nio-443-exec-2] ProcessDoc.ProcessDocument
> (ProcessDocument.java:156)  []
>  - ProcessDocument:readFromPDFFile, txt extracted*:**Kindly refer to
> my
> LinkedIn profile for More details related to certifications,
> education etc.
> *
> 
> Yogesh Dixit
>         Gurgaon, Haryana
> 
> 
> The first line in PDF is Yogesh Dixit Any help will be appreciated.
> Regards,
> Naganand Kanagal
> 
> 
> Regards,
> Naganand Kanagal




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to