Thanks Wade, but "handler.toString()" is not going to work for us because of the memory restrictions. We ended up using BodyContentHandler(PipedOutputStream) and had the output stream pipe to PipedInputStream effectively giving us what we needed.
Thanks! Alec On Thu, Apr 26, 2012 at 6:14 AM, Taylor, Wade <[email protected]> wrote: > have you tried using BodyContentHandler? for example: > > ... > ContentHandler handler = new BodyContentHandler(); > parser.parse(inputStream, handler, metadata, context); > InputStream charStream = new ByteArrayInputStream(handler.toString()); > ... > > > > Regards, > Wade > > > > > > On Wed, Apr 25, 2012 at 12:08 PM, Alec Swan <[email protected]> wrote: >> >> Hello, >> >> We are replacing another text extraction library with Tika. We have >> legacy code which expects document text to be output as an >> InputStream. I understand that this is not directly related to Tika, >> but I am assuming that other Tika users already solved this problem. >> >> Does anybody have any sample code or ideas that will help us pipe >> chars in ContentHandler#characters(..) method to a stream? Is there an >> existing ContentHandler implementation that does this already? >> >> Thanks, >> >> Alec > >
