I set it to 0, there are some big pdfs on the sites I am crawlign.
Thanks Jeff.

-----Original Message-----
From: Jeff Ritchie [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 28, 2006 4:37 PM
To: [email protected]
Subject: Re: PDF Parse Error


In nutch-site.xml
Set it to something like

<property>
<name>http.content.limit</name>
<value>655360</value>
</property>

Jeff.


Richard Braman wrote:

>I get the following errors regarding pdf:
> 
>060228 160518 fetch okay, but can't parse 
>http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_h
>i
>.pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser
>can't handle incomplete pdf file.
> 
>060228 160354 fetch okay, but can't parse 
>http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
>failed(2,0): Can't be handled as pdf document. 
>java.lang.NullPointerException
> 
>060228 160518 fetch okay, but can't parse 
>http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Instr
>u
>ctions.pdf, reason: failed(2,0): Can't be handled as pdf document.
>java.io.IOException: You do not have permission to extract text
> 
>I have a number of errors like this in my log, mostly the content 
>truncated one.
> 
>The thing is these files all open fine in acrobat.
> 
> 
>
>Richard Braman
>mailto:[EMAIL PROTECTED]
>561.748.4002 (voice)
>
>http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/>
>Free Open Source Tax Software
>
> 
>
>  
>



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to