Jukka and All - I think a Tika facade would be awesome.
I guess where I mentioned streams, I should be mentioning readers and writers instead. BTW, how can I insert new text into quoted sections of a message in Nabble? Regarding having a method that returns a Reader rather than taking a Writer being better for Lucene, for other use cases a Writer might be more convenient (for writing to files, for example). Having a method that takes a Writer would, I think, be more useful than having a method returning a string because it could 1) support sizes larger than memory capacity, 2) easily support output to files, and 3) still support strings (by using a StringWriter). Speaking of Lucene, I have never used Lucene directly, so I lack the context to understand the Tika/Lucene integration. All my input is from the point of view of someone who just wants to parse text from documents and do things other than text search. So if I neglect to include Lucene in my outlook, rest assured that it is just ignorance and nothing more. ;) Regarding XHTML, we already support it on the command line. My sense is that Excel spreadsheet parsing would be used more often for structured data than for raw text (that's certainly true for me), so I hope we could keep that. I understand your suggestion to use Poi directly for more sophisticated document handling, though. Everything else sounded good to me. Regards, Keith -- View this message in context: http://www.nabble.com/Moving-Functionality-from-CLI-to-ParseUtils-tp24337541p24453304.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.