have you tried using BodyContentHandler? for example: ... ContentHandler handler = new BodyContentHandler(); parser.parse(inputStream, handler, metadata, context); InputStream charStream = new ByteArrayInputStream(handler.toString()); ...
Regards, Wade On Wed, Apr 25, 2012 at 12:08 PM, Alec Swan <[email protected]> wrote: > Hello, > > We are replacing another text extraction library with Tika. We have > legacy code which expects document text to be output as an > InputStream. I understand that this is not directly related to Tika, > but I am assuming that other Tika users already solved this problem. > > Does anybody have any sample code or ideas that will help us pipe > chars in ContentHandler#characters(..) method to a stream? Is there an > existing ContentHandler implementation that does this already? > > Thanks, > > Alec >
