Hi Michael, The default value for the content limit in nutch-default.xml is 65536. This is set in these properties:
http.content.limit file.content.limit ftp.content.limit So irrespective of the file size, the download is limited to this value. To allow parsing of the files that exceed this limit, copy the above 3 properties into nutch-site.xml and increase them to your desired number. - Ravi Chintakunta On 3/24/06, Michael Ji <[EMAIL PROTECTED]> wrote: > Hi there, > > I got the following errors; > > 060324 095216 http.max.delays = 10000 > 060324 095217 fetch okay, but can't parse > http://www.ucis.pitt.edu/cwes/papers/work_papers/wp6_2005.pdf, > reason: failed(2,202): Content truncated at 69266 > bytes. Parser can't handle incomplete pdf file. > > Seems fetching is successfully, but not for parsing; I > expanding delays to 10000, still not enough? > > thanks, > > Michael > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
