Hi,

On Wed, May 25, 2011 at 11:24 AM, Nick Burch <[email protected]> wrote:
> On Tue, 24 May 2011, Christanto Leonardo wrote:
>> What is the minimum jar required to use the best Tika detection can offer?
>
> My hunch is it'd be tika-core, all the tika-core dependencies, tika-parsers,
> poi, and a few bits of commons, but you'd need to do some tests...

Yep. You get pretty good type detection already with only the
tika-core jar (that has no external dependencies). For the
container-aware detection functionality (for accurately detecting MS
Office formats, etc.) you also need the dependencies as outlined by
Nick. If you are not constrained by size, it's probably easiest if you
simply use the whole dependency tree.

>> Currently I am using this code to do detection (if this is the best way to
>> do detection):
>>  Detector detector = new
>> ContainerAwareDetector(MimeTypes.getDefaultMimeTypes());
>>  Tika tika = new Tika(detector);
>>  String mimeType = tika.detect(TikaInputStream.get(in));
>
> Jukka has done a bit of refactoring, so now I think you can use
> CompositeDetector instead of ContainerAwareDetector, and it'll pick up the
> container parsers dynamically for you

Yes, you can just do:

    String mimeType = new Tika().detect(in);

This will automatically find and use all the detectors available in
the classpath, and will even take care of the TikaInputStream wrapping
for you.

BR,

Jukka Zitting

Reply via email to