Its because the size of the maximum content size. Change the content.limit values in your site configuration file.
On Tue, 31 May 2005 10:11:02 -0500 "Kyle Gabhart" <[EMAIL PROTECTED]> wrote: > I have a large number of documents on our intranet (about > 1000) that are indexed by nutch (version 0.6). On about > 1/3 of those documents I get the following error: > > 050529 011245 fetch okay, but can't parse PATH_TO_FILE, > reason: Content truncated at 65536 bytes. Parser can't > handle incomplete msword file. > The same happens on some PDF files. Any ideas? > > -KG > > _____________________________________________________________________ For super low premiums, click here http://www.dialdirect.co.za/quote ------------------------------------------------------- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
