I set it to 0, there are some big pdfs on the sites I am crawlign. Thanks Jeff.
-----Original Message----- From: Jeff Ritchie [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 28, 2006 4:37 PM To: [email protected] Subject: Re: PDF Parse Error In nutch-site.xml Set it to something like <property> <name>http.content.limit</name> <value>655360</value> </property> Jeff. Richard Braman wrote: >I get the following errors regarding pdf: > >060228 160518 fetch okay, but can't parse >http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_h >i >.pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser >can't handle incomplete pdf file. > >060228 160354 fetch okay, but can't parse >http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: >failed(2,0): Can't be handled as pdf document. >java.lang.NullPointerException > >060228 160518 fetch okay, but can't parse >http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Instr >u >ctions.pdf, reason: failed(2,0): Can't be handled as pdf document. >java.io.IOException: You do not have permission to extract text > >I have a number of errors like this in my log, mostly the content >truncated one. > >The thing is these files all open fine in acrobat. > > > >Richard Braman >mailto:[EMAIL PROTECTED] >561.748.4002 (voice) > >http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> >Free Open Source Tax Software > > > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
