I know that it's a POI supported file : - ) Thanks, that gave me some directions to follow.
Pedro Dalcin On Fri, Jul 29, 2011 at 12:02 PM, Nick Burch <[email protected]>wrote: > On Fri, 29 Jul 2011, Pedro Dalcin wrote: > >> I'm actually after a way to identify what type of file I'm inputting. I've >> figured I can't simply check the extension since there are different types >> of "doc" that use the same extension. >> > > If you know it's a POI supported file, then passing it to the Extractor > Factory will give you back a suitable simple text extractor for it > > If you don't know what it is, or if you want "fancy" text extraction, then > use Apache Tika. That provides detection, and fairly full featured text > extractors. > > (If you know what it is, and you need full control of the text extraction > process, then you'll likely want to write you own code on top of POI) > > > Nick > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > [email protected].**org<[email protected]> > For additional commands, e-mail: [email protected] > >
