Julien Nioche wrote:

    Well ... let's consider this: in the past we used to put things
    under /lib/ when they were being used by more than a few plugins.
    Then we started using library-only plugins (e.g. lib-xml,
    lib-nekohtml, etc). There is a mechanism that allows us to export
    any classes from a plugin so that they are visible to the rest of
    the framework.

    It looks to me like we could be better off by putting all parts of
    Tika in a single plugin, and then in Nutch core use a new extension
    point just for the purpose of mimetype detection. This facade
    (MimeDetectors) would use the Tika plugin if available, or some
    other (null?) mechanism otherwise. At the same time Tika would be
    happy to configure itself having all tika-core and parsers available
    under the same classloader, and it would define two extension points
    - one for mimetype detection, and another for parsing. What do you
    think?


I haven't looked yet at the way extension points work, so I don't really have an idea on how difficult this would be. Some of Tika's classes (mostly MimeType) are used explicitly in several places of the core, would we need to hide them behind non Tika objects in order not to have direct dependencies?

Yes, that was my idea.

I suppose we could try to make progress on the Tika plugin as it is now (i.e. with the work around I described earlier) and refactor things in a later stage using the extension points. Makes sense?

We could, but if we can figure out a cleaner solution now, then we should follow it instead of committing that workaround and then having to refactor it ...

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to