I believe these errors are due to a parsing bug in PDFBox that has been
fixed since the 0.7.2 release.  Please give the nightly build(should be a
drop in replacement) a try from http://www.pdfbox.org/dist and let me know
if you are still having issues.

Ben



On Tue, 28 Feb 2006, Richard Braman wrote:

> I get the following errors regarding pdf:
>
> 060228 160518 fetch okay, but can't parse
> http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_hi
> .pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser
> can't handle incomplete pdf file.
>
> 060228 160354 fetch okay, but can't parse
> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
> failed(2,0): Can't be handled as pdf document.
> java.lang.NullPointerException
>
> 060228 160518 fetch okay, but can't parse
> http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Instru
> ctions.pdf, reason: failed(2,0): Can't be handled as pdf document.
> java.io.IOException: You do not have permission to extract text
>
> I have a number of errors like this in my log, mostly the content
> truncated one.
>
> The thing is these files all open fine in acrobat.
>
>
>
> Richard Braman
> mailto:[EMAIL PROTECTED]
> 561.748.4002 (voice)
>
> http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/>
> Free Open Source Tax Software
>
>
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to