I've installed pdfbox 0.7.3 on gentoo Linux via portage.   gentoo installs a
'pdfextracttext' wrapper which calls java appropriately, with the right
classpaths, etc.   I'm using java 1.5, but have tried with java 1.6 as
well.   I've written a script to download and convert local police reports
from PDF to TXT, and it works fine on about 200 of the 225 reports available
this year.   On the other 25, I'm getting NullPointerExceptions:

$ pdfextracttext 011608.pdf
Exception in thread "main" java.lang.NullPointerException
        at org.pdfbox.ExtractText.main(ExtractText.java:208)


http://www.co.ho.md.us/Police/DOCS/011608.pdf is an example of a PDF that
causes the NPE.

http://www.co.ho.md.us/Police/DOCS/011008.pdf is one (from the same week)
that works fine.

I don't see anything obvious in the PDF (special characters, graphics, etc)
- both view fine in acrobat reader.

any ideas?

Thanks,
Scott

Reply via email to