Re: How can I help Tika to choose the right extractor for my data?

Jukka Zitting Tue, 06 Jul 2010 06:04:24 -0700

Hi,

Sorry for the late response.


On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <[email protected]> wrote:
> I learned that's possible to help (or advice) Tika to choose the right
> extractor for a document if I have for example its MimeType.
> Am exactly in this case. For each document in my collection, I know its
> MimeType.
> How one can apply this idea guys (code snippet please)?

You'll want to pass the media type as a part of the input metadata you
pass to the parsing process, like this:

    Metadata metadata = new Metadata():
    metadata.set(Metadata.CONTENT_TYPE, knownType);

    Parser parser = new AutoDetectParser();
    parser.parse(..., metadata, ...);

> Finally, does someone know when Tika-0.8 will be released?

At current pace I expect it to be out sometime in this quarter.

BR,

Jukka Zitting

Re: How can I help Tika to choose the right extractor for my data?

Reply via email to