Parser chokes on some documents

Kyle Gabhart Tue, 31 May 2005 08:15:22 -0700

I have a large number of documents on our intranet (about 1000) that areindexed by nutch (version 0.6). On about 1/3 of those documents I getthe following error:

050529 011245 fetch okay, but can't parse PATH_TO_FILE, reason: Contenttruncated at 65536 bytes. Parser can't handle incomplete msword file.

The same happens on some PDF files.  Any ideas?

-KG

Parser chokes on some documents

Reply via email to