On Feb 5, 2009, at 1:22 AM, Jukka Zitting wrote:
Hi,
On Thu, Feb 5, 2009 at 3:02 AM, Jonathan Koren
<jonat...@soe.ucsc.edu> wrote:
What I really want is someone to tell me how to get back a usable
stream of
plaintext, whether this involves a radical change to Tika's
ContentHandler
class or some trick with Java, I really don't care, as long as it's
single
thread save.
Have you looked at the ParsingReader class? It seems like a perfect
match to your needs. The ParsingReader class fires a background thread
to do the parsing and pipes the output so you can control when and how
you want to read the extracted text.
I had no idea that class existed. Thanks.
--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/