Re: PDF parse failing to capture entire text

Jukka Zitting Fri, 11 Jan 2013 05:15:15 -0800

Hi,

On Fri, Jan 4, 2013 at 10:00 PM, Jack Park <[email protected]> wrote:
> The paper itself is found by following the link from here:
> http://openagricola.nal.usda.gov/Record/IND23271089
>
> (I will send the file offlist if needed; it's 64k)


I can take a closer look at the file If you send it to me (I didn't
find a link to download it).

A good test on whether Tika can (or should be able to) extract text
from a PDF is to try copy-pasting the text from a normal PDF viewer.
If you can copy the text, then Tika should be able to extract it (it's
a bug if it doesn't). If you can't (for example if it's a scanned
image), then there's little we can do.

BR,

Jukka Zitting

Re: PDF parse failing to capture entire text

Reply via email to