I know that it's a POI supported file : - )
Thanks, that gave me some directions to follow.

Pedro Dalcin


On Fri, Jul 29, 2011 at 12:02 PM, Nick Burch <[email protected]>wrote:

> On Fri, 29 Jul 2011, Pedro Dalcin wrote:
>
>> I'm actually after a way to identify what type of file I'm inputting. I've
>> figured I can't simply check the extension since there are different types
>> of "doc" that use the same extension.
>>
>
> If you know it's a POI supported file, then passing it to the Extractor
> Factory will give you back a suitable simple text extractor for it
>
> If you don't know what it is, or if you want "fancy" text extraction, then
> use Apache Tika. That provides detection, and fairly full featured text
> extractors.
>
> (If you know what it is, and you need full control of the text extraction
> process, then you'll likely want to write you own code on top of POI)
>
>
> Nick
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> [email protected].**org<[email protected]>
> For additional commands, e-mail: [email protected]
>
>

Reply via email to