Hello, I am using Tika not just for extracting text content from the files (embedded or not), but I also need to extract embedded files into separate location on file system, to have them aaccessible as top-level files.
And because parsed files can be very big, I had to do this file extraction in streaming fashion, meaning, within my custom recursiveParser, that is ParserDecorator based on idea from Tika wiki, for each embedded file I create my own ByteCollectionInputStreamWrapper which is InputStream decorator that pushes extracted files' bytes to some disk location during the very process of Tika parsers reading them. And everything works fine and very efficiently. BUT, the problem is when parsing error occurs, such as Tika trying to parse password-protected archive for example. The whole org.apache.tika.parser.Parser#parse method fails by raising the exception, and reading and consequently extracting bytes from file's InputStream is interrupted, so my extraction to separate file location is broken. Does anyone has some diea how to collect bytes in such error situations? Regards, Vjeran
