[Nutch-general] Re: Parser chokes on some documents

quovadis Tue, 31 May 2005 08:49:33 -0700

Its because the size of the maximum content size. Change
the content.limit values in your site configuration file.



On Tue, 31 May 2005 10:11:02 -0500
 "Kyle Gabhart" <[EMAIL PROTECTED]> wrote:
> I have a large number of documents on our intranet (about
> 1000) that are indexed by nutch (version 0.6).  On about
> 1/3 of those documents I get the following error:
> 
> 050529 011245 fetch okay, but can't parse PATH_TO_FILE,
> reason: Content truncated at 65536 bytes. Parser can't
> handle incomplete msword file. 
> The same happens on some PDF files.  Any ideas?
> 
> -KG
> 
> 

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote


-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Parser chokes on some documents

Reply via email to