<mailto:[email protected]>On 7/1/2014 12:04 PM, Jukka Zitting wrote:
The TaggedInputStream class [1] was designed for such cases where we
want to distinguish between IOExceptions thrown by the underlying
InputStream and those thrown by the library processing the stream. It
can be used like this:
TaggedInputStream tagged = new TaggedInputStream(stream);
try {
parse(tagged);
} catch (IOException e) {
tagged.throwIfCauseOf(e); // throws IOException if from stream
throw new TikaException("Parse error", e);
}
[1] http://tika.apache.org/1.0/api/org/apache/tika/io/TaggedInputStream.html
BR,
Jukka Zitting
This sounds like a good approach for Tika to take. I did still ask
PDFBox about this. If they give something other than an IOException,
will the underlying code in the throwIfCause method need to be adjusted?
In this case, I'm parsing a PDF, so I'll be using something like this:
//given ByteArrayInputStream stream, BodyContentHandler textHandler,
Metadata metadata, ParseContext parseContext:
TaggedInputStream tagged = new TaggedInputStream(stream);
try {
PDFParser parser = new PDFParser();
parser.parse(stream, textHandler, metadata, parseContext);
} catch (IOException e) (
tagged.throwIfCauseOf(e);
throw new TikaException("Parse error", e);
}
Thanks,
Daniel Gibby