On Fri, 8 Oct 2010, Jan Høydahl / Cominvent wrote:
Magic is most often great, but I generally prefer to have some way of explicitly telling the software what to do :)

That's very much available to you! See the different constructors to the AutoDetectParser for examples of how to control what detector is used, what parsers get used etc

Now you discover that you prefer another parser for some of the formats which the 3rd party plugin "hi-jacked". You can't modify their source code, so how do you tell Tika this?

At that point your uses are probably sufficiently different to the default that you shouldn't be using the no-argument AutoDetectParser constructor!

I propose an optional config file which, if found, overrides the mime types specified - if the specified class is found and says it supports the mime type of course.

Or you could just have a regular Tika config file, and list in there only the parsers you're interested in using?

Nick

Reply via email to