Anyone else have a workaround for reusing an input stream that has
been given to Tika Detect?

According to inline comments in the Tika code, it gives the impression
that developers understand about how to correctly use an Input Stream,
but the implementation doesn't appear to match the comments. i.e. use
mark, reset.

I've written a sample test,
https://gist.github.com/nhojpatrick/ac01a3b3d791364b26f8

Compile it then do java -cp ./ TikaDetectTester [path to file to test].

It will create a *.out1 of the origional, then do a detect then write
the file out again to *.out2 using the input stream given to tika.

Cheers,
John

Reply via email to