Hi Michael,

The default value for the content limit in nutch-default.xml is 65536.
This is set in these properties:

http.content.limit
file.content.limit
ftp.content.limit

So irrespective of the file size,  the download is limited to this value.

To allow parsing of the files that exceed this limit, copy the above 3
properties into nutch-site.xml and increase them to your desired
number.


- Ravi Chintakunta



On 3/24/06, Michael Ji <[EMAIL PROTECTED]> wrote:
> Hi there,
>
> I got the following errors;
>
> 060324 095216 http.max.delays = 10000
> 060324 095217 fetch okay, but can't parse
> http://www.ucis.pitt.edu/cwes/papers/work_papers/wp6_2005.pdf,
> reason: failed(2,202): Content truncated at 69266
> bytes. Parser can't handle incomplete pdf file.
>
> Seems fetching is successfully, but not for parsing; I
> expanding delays to 10000, still not enough?
>
> thanks,
>
> Michael
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to