Parsing PDF Nutch Achilles heel?

Håvard W. Kongsgård Wed, 25 Jan 2006 07:11:19 -0800

I have been doing some testing on different nutch configurations to seewhat slows down the fetching process on my servers(nutch 0.7.1).

My general experience is that the PDF parse process is nutchs Achilles heel.

Nutch works fine on older computers, but with the combination of|parse-(text|html|pdf)and http.content.limit = -1(needed to get PDF parsing to work) nutchsometimes freezes completely.

Is there planned any improvement to the parsing of PDF files in the nextversion of nutch (0.8)?

Parsing PDF Nutch Achilles heel?

Reply via email to