Hi Jack Try extractByArea instead of getText. There is also sample code explaining the same
Regards Elbin On Fri, May 27, 2011 at 3:02 PM, Jack Bush <[email protected]> wrote: > Hi Eric, > > Thanks for responding back to my call for assistance. > > I am extracting text from a PDF file only. The rows of data has been moved > around and the heading is down the bottom of the rows of data, possibly > from a > table. The order of the page has also gone out of sync. > > Here is an example of the file that I am try to extract from > http://www.homepriceguide.com.au/saturday_auction_results/Adelaide.pdf > > I am only interested in the stats in the middle of the page. > > Thanks again, > > Jack > ----- Original Message ---- > From: Eric Douglas <[email protected]> > To: [email protected] > Sent: Fri, 27 May, 2011 12:28:52 AM > Subject: RE: How to keep PDF format when extracting text > > This sounds a bit vague. PDF format sounds like you're creating a PDF, but > your > description sounds more like you're getting text from a PDF trying to make > it > look like it does in the PDF. Are you trying to modify a PDF or are you > just > losing font information on etracted text? > Is the font information embedded? > Do you have any samples of your text extraction code or a PDF you're > extracting? > > > -----Original Message----- > From: Jack Bush [mailto:[email protected]] > Sent: Thursday, May 26, 2011 10:12 AM > To: [email protected] > Subject: How to keep PDF format when extracting text > > Hi All, > > I have no problem extracting text from pdf document using > pdfbox-app-1.5.0.jar > but found that the format has been lost. Also downloaded fontbox-1.5.0.jar > and > jempbox-1.5.0.jar but not sure how to use them to improve the format of the > extracted text file to be as close to the orginial pdf file as possible. > > Are there any good document around on this topic on using recent jars. I > found > some material from Google but they are either using a much earlier version > (0.8) of pdfbox or the explanantion is insufficient to follow. It is not in > PDDFBox FAQ. > > Do you have an archived mailing list I could lookup? > > Many thanks, > > Jack > > > -- Thanks & Regards Elbin K Elias

