I have a large number of documents on our intranet (about 1000) that are indexed by nutch (version 0.6). On about 1/3 of those documents I get the following error:

050529 011245 fetch okay, but can't parse PATH_TO_FILE, reason: Content truncated at 65536 bytes. Parser can't handle incomplete msword file.
The same happens on some PDF files.  Any ideas?

-KG


Reply via email to