Thanks. That means it's possible. So I guess we're down to classpath wizardry,
if not the service framework has some features to control order of loading etc.
Magic is most often great, but I generally prefer to have some way of
explicitly telling the software what to do :)
A good example is 3rd party parser vendors. Say company X specializes in
parsing multimedia, and releases a Tika plugin version of their parser to give
existing Tika users a very easy upgrade path to their (licensed, non-free)
product. Their plugin registers for a bunch of image and video mime-types.
Now you discover that you prefer another parser for some of the formats which
the 3rd party plugin "hi-jacked". You can't modify their source code, so how do
you tell Tika this?
I propose an optional config file which, if found, overrides the mime types
specified - if the specified class is found and says it supports the mime type
of course.
<tika-mime-mappings>
<mappings clear-all="false">
<mapping mime="application/word"
class="org.apache.tika.parser.microsoft.OfficeParser" />
<mapping mime="application/excel" class="3rdparty.vendor.tika.OfficeParser"
/>
</mappings>
</tika-mime-mappings>
Good or bad idea?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
On 8. okt. 2010, at 11.11, Nick Burch wrote:
> On Fri, 8 Oct 2010, Jan Høydahl / Cominvent wrote:
>> My question was for a very specific usecase which is easy to do by a small
>> source code modification but perhaps harder to do with configuration only.
>
> Looking at the AutoDetectParser source code, the last parser registered for a
> given mime type wins. So, if you have your custom word parser register after
> the built in one, then your custom one gets used.
>
> You might find that if you're using the service file method of listing the
> parsers to load, then you just need to get your custom parser jar file to
> sort lexographically after the main tika parsers jar, but that's one to test.
> (The service registry we use is the javax.imageio one)
>
> Nick