Hi Folks,
I have a problematic PDF which I keeps on crashing my Nutch crawl.
I am trying to get all data from the PDF, so content is not truncated at
all.
http://www.who.int/about/who_reform/who-internal-control-framework.pdf
Can someone please try to see if they have any issues parsing this document
with Tika 1.6?
I have tried it locally, and it seems OK. If I can confirm this with some
other folks then I can isolate this to my Nutch crawl.
Thank you
Lewis

-- 
*Lewis*

Reply via email to