I've installed pdfbox 0.7.3 on gentoo Linux via portage. gentoo installs a 'pdfextracttext' wrapper which calls java appropriately, with the right classpaths, etc. I'm using java 1.5, but have tried with java 1.6 as well. I've written a script to download and convert local police reports from PDF to TXT, and it works fine on about 200 of the 225 reports available this year. On the other 25, I'm getting NullPointerExceptions:
$ pdfextracttext 011608.pdf Exception in thread "main" java.lang.NullPointerException at org.pdfbox.ExtractText.main(ExtractText.java:208) http://www.co.ho.md.us/Police/DOCS/011608.pdf is an example of a PDF that causes the NPE. http://www.co.ho.md.us/Police/DOCS/011008.pdf is one (from the same week) that works fine. I don't see anything obvious in the PDF (special characters, graphics, etc) - both view fine in acrobat reader. any ideas? Thanks, Scott