Antony Bowesman wrote:
> I'm looking to use the Nutch parsing framework in a separate Lucene
> project. I'd like to be able to use the existing plugins directory
> structure as-is, so wondered Nutch sets up the class loading environment
> to find all the jar files in the plugins directories.

There are dedicated class loaders for each plugin. The classpath is
constructed (recursively) based on plugin metadata (plugin.xml).

> Any pointers to the Nutch class(es) that do the work?

Check the package o.a.n.plugin which contains most of the general
plug-in code.

There's also a recently established project called Apache Tika [1] which
has a goal of putting together generally usable parsing/extracting
framework. It hasn't yet got out of the ground so there is a good chance
to get your voice heard.

[1] http://incubator.apache.org/tika/

-- 
 Sami Siren

Reply via email to