On Fri, 29 Jul 2011, Pedro Dalcin wrote:
I'm actually after a way to identify what type of file I'm inputting. I've figured I can't simply check the extension since there are different types of "doc" that use the same extension.

If you know it's a POI supported file, then passing it to the Extractor Factory will give you back a suitable simple text extractor for it

If you don't know what it is, or if you want "fancy" text extraction, then use Apache Tika. That provides detection, and fairly full featured text extractors.

(If you know what it is, and you need full control of the text extraction process, then you'll likely want to write you own code on top of POI)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to