Jukka , As always ... Sorry for being late to reply. I have just tested this now ... And it extracts the text just fine.
Best regards , Hesham --------------------------------------------- Included message : > Hi, > > On 04/02/2011 03:24 PM, Hesham G. wrote: >> I have a PDF file that I am extracting data from it using PDFBox >> v1.5. If i copy text from it manually like: "SUPPLY FAN | G0320 >> B11-14998" to Notepad, it is copied fine ... But in PDFBox it is read >> like this: "SUPPLY FAN | B11-14998G0320" ... Many other text does the >> same thing. You can test a 1 page sample PDF here : >> http://www.4shared.com/document/XDzWQFyY/wrong_extracted_text_sample.html > > Enabling the sortByPosition option [1] in the text extraction typically > helps solve problems like this. See also the equivalent -sort option of > the ExtractText command [2]. > > [1] > http://pdfbox.apache.org/apidocs/org/apache/pdfbox/util/PDFTextStripper.html#setSortByPosition(boolean) > [2] http://pdfbox.apache.org/commandlineutilities/ExtractText.html > > -- > Jukka Zitting >

