IOException should be something more specific?

Daniel Gibby Tue, 01 Jul 2014 11:21:02 -0700

Using Tika 1.5 (latest release which uses PDFBox) I'm seeing thefollowing IOException parsing certain PDFs.


java.io.IOException: Error: Header doesn't contain versioninfo

atorg.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:335)

   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:177)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
...

Should this be something more specific than just an IOException, so thatTika can know whether to just let it bubble up as an IOException, orencapsulate it into a TikaException?

I don't know enough about the PDFBox project to know if there are everany exceptions besides IOExceptions thrown. Perhaps there could be aPDFParseException or something like that when you run into knownsituations. But if IOExceptions only ever happen when you run into knownsituations, then Tika could just know that is the case and wrap anyIOException from PDFBox into a TikaException.


What do you think?

Thanks,
Daniel Gibby

IOException should be something more specific?

Reply via email to