Hi Jack

Try extractByArea instead of getText. There is also sample code explaining
the same

Regards
Elbin

On Fri, May 27, 2011 at 3:02 PM, Jack Bush <[email protected]> wrote:

> Hi Eric,
>
> Thanks for responding back to my call for assistance.
>
> I am extracting text from a PDF file only. The rows of data has been moved
> around and the heading is down the bottom of the rows of data, possibly
> from a
> table. The order of the page has also gone out of sync.
>
> Here is an example of the file that I am try to extract from
> http://www.homepriceguide.com.au/saturday_auction_results/Adelaide.pdf
>
> I am only interested in the stats in the middle of the page.
>
> Thanks again,
>
> Jack
> ----- Original Message ----
> From: Eric Douglas <[email protected]>
> To: [email protected]
> Sent: Fri, 27 May, 2011 12:28:52 AM
> Subject: RE: How to keep PDF format when extracting text
>
> This sounds a bit vague.  PDF format sounds like you're creating a PDF, but
> your
> description sounds more like you're getting text from a PDF trying to make
> it
> look like it does in the PDF.  Are you trying to modify a PDF or are you
> just
> losing font information on etracted text?
> Is the font information embedded?
> Do you have any samples of your text extraction code or a PDF you're
> extracting?
>
>
> -----Original Message-----
> From: Jack Bush [mailto:[email protected]]
> Sent: Thursday, May 26, 2011 10:12 AM
> To: [email protected]
> Subject: How to keep PDF format when extracting text
>
> Hi All,
>
> I have no problem extracting text from pdf document using
> pdfbox-app-1.5.0.jar
> but found that the format has been lost. Also downloaded fontbox-1.5.0.jar
> and
> jempbox-1.5.0.jar but not sure how to use them to improve the format of the
> extracted text file to be as close to the orginial pdf file as possible.
>
> Are there any good document around on this topic on using recent jars. I
> found
> some material from Google but they are either using a much earlier version
> (0.8) of pdfbox or the explanantion is insufficient to follow. It is not in
> PDDFBox FAQ.
>
> Do you have an archived mailing list I could lookup?
>
> Many thanks,
>
> Jack
>
>
>


-- 
Thanks & Regards
Elbin K Elias

Reply via email to