On Fri, 27 Jun 2014, Daniel Gibby wrote:
java.io.IOException: Error: Header doesn't contain versioninfo
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:335)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:177)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
...
Shouldn't this be a TikaException of some type, or at least something other
than just an IOException?
One option might be to catch the IOException in the Tika code, then
re-throw it as a TikaException. However, I'd probably prefer it if we
could get the PDFBox project to make it a more specific exception, which
we could then catch and re-throw as a TikaException. I'm not sure we want
to be catching all PDFBox IOExceptions, as that might mask a real
IOException?
Nick