Yes, the NPE should be fixed.
Ben
Richard Braman wrote:
Hi Bn,
We actually got to the bottom of all of them except for 1... The content
truncatetion was due to an inconsistancy bug in nutch config .
The no permission to extract text is actually true, the author, the NC
Department of revenue put this restriction on all of their files (I have
asked them to remove it as it hampers public accessability). The Null
pointer exception is the only one to deal with that may be due to the
parsing bug . Is this one that you are referring to?
-----Original Message-----
From: Ben Litchfield [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 02, 2006 4:07 PM
To: Richard Braman
Cc: [email protected]; [email protected];
[EMAIL PROTECTED]
Subject: Re: [PDFBox-user] PDF Parse Error
I believe these errors are due to a parsing bug in PDFBox that has been
fixed since the 0.7.2 release. Please give the nightly build(should be
a drop in replacement) a try from http://www.pdfbox.org/dist and let me
know if you are still having issues.
Ben
On Tue, 28 Feb 2006, Richard Braman wrote:
I get the following errors regarding pdf:
060228 160518 fetch okay, but can't parse
http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_
hi
.pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser
can't handle incomplete pdf file.
060228 160354 fetch okay, but can't parse
http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
failed(2,0): Can't be handled as pdf document.
java.lang.NullPointerException
060228 160518 fetch okay, but can't parse
http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Inst
ru
ctions.pdf, reason: failed(2,0): Can't be handled as pdf document.
java.io.IOException: You do not have permission to extract text
I have a number of errors like this in my log, mostly the content
truncated one.
The thing is these files all open fine in acrobat.
Richard Braman
mailto:[EMAIL PROTECTED]
561.748.4002 (voice)
http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> Free
Open Source Tax Software
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers