Julien Nioche wrote:
Well ... let's consider this: in the past we used to put things
under /lib/ when they were being used by more than a few plugins.
Then we started using library-only plugins (e.g. lib-xml,
lib-nekohtml, etc). There is a mechanism that allows us to export
any classes from a plugin so that they are visible to the rest of
the framework.
It looks to me like we could be better off by putting all parts of
Tika in a single plugin, and then in Nutch core use a new extension
point just for the purpose of mimetype detection. This facade
(MimeDetectors) would use the Tika plugin if available, or some
other (null?) mechanism otherwise. At the same time Tika would be
happy to configure itself having all tika-core and parsers available
under the same classloader, and it would define two extension points
- one for mimetype detection, and another for parsing. What do you
think?
I haven't looked yet at the way extension points work, so I don't really
have an idea on how difficult this would be. Some of Tika's classes
(mostly MimeType) are used explicitly in several places of the core,
would we need to hide them behind non Tika objects in order not to have
direct dependencies?
Yes, that was my idea.
I suppose we could try to make progress on the Tika plugin as it is now
(i.e. with the work around I described earlier) and refactor things in a
later stage using the extension points. Makes sense?
We could, but if we can figure out a cleaner solution now, then we
should follow it instead of committing that workaround and then having
to refactor it ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com