Hi, Sorry for the late response.
On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <[email protected]> wrote: > I learned that's possible to help (or advice) Tika to choose the right > extractor for a document if I have for example its MimeType. > Am exactly in this case. For each document in my collection, I know its > MimeType. > How one can apply this idea guys (code snippet please)? You'll want to pass the media type as a part of the input metadata you pass to the parsing process, like this: Metadata metadata = new Metadata(): metadata.set(Metadata.CONTENT_TYPE, knownType); Parser parser = new AutoDetectParser(); parser.parse(..., metadata, ...); > Finally, does someone know when Tika-0.8 will be released? At current pace I expect it to be out sometime in this quarter. BR, Jukka Zitting
