<mailto:[email protected]>On 7/1/2014 12:04 PM, Jukka Zitting wrote:

The TaggedInputStream class [1] was designed for such cases where we
want to distinguish between IOExceptions thrown by the underlying
InputStream and those thrown by the library processing the stream. It
can be used like this:

     TaggedInputStream tagged = new TaggedInputStream(stream);
     try {
         parse(tagged);
     } catch (IOException e) {
         tagged.throwIfCauseOf(e); // throws IOException if from stream
         throw new TikaException("Parse error", e);
     }

[1] http://tika.apache.org/1.0/api/org/apache/tika/io/TaggedInputStream.html

BR,

Jukka Zitting
This sounds like a good approach for Tika to take. I did still ask PDFBox about this. If they give something other than an IOException, will the underlying code in the throwIfCause method need to be adjusted?

In this case, I'm parsing a PDF, so I'll be using something like this:

//given ByteArrayInputStream stream, BodyContentHandler textHandler, Metadata metadata, ParseContext parseContext:
TaggedInputStream tagged = new TaggedInputStream(stream);
try {
    PDFParser parser = new PDFParser();
    parser.parse(stream, textHandler, metadata, parseContext);
} catch (IOException e) (
    tagged.throwIfCauseOf(e);
   throw new TikaException("Parse error", e);
}

Thanks,
Daniel Gibby

Reply via email to