Hi Michael, The default value for the content limit in nutch-default.xml is 65536. This is set in these properties:
http.content.limit file.content.limit ftp.content.limit So irrespective of the file size, the download is limited to this value. To allow parsing of the files that exceed this limit, copy the above 3 properties into nutch-site.xml and increase them to your desired number. - Ravi Chintakunta On 3/24/06, Michael Ji <[EMAIL PROTECTED]> wrote: > Hi there, > > I got the following errors; > > 060324 095216 http.max.delays = 10000 > 060324 095217 fetch okay, but can't parse > http://www.ucis.pitt.edu/cwes/papers/work_papers/wp6_2005.pdf, > reason: failed(2,202): Content truncated at 69266 > bytes. Parser can't handle incomplete pdf file. > > Seems fetching is successfully, but not for parsing; I > expanding delays to 10000, still not enough? > > thanks, > > Michael > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >
