Hi Breno, On Tue, Jun 2, 2015 at 1:38 AM, <[email protected]> wrote:
> > We are indexing several domains for a specific project, which may contain > duplicated content (e.g. pdf files). The users of the system come from > different organisations and wonder why the content is not appearing under > certain domains. It's a usability issue (with a political aftertaste). > Thanks for explanation. > > Yes, I extended Signature, and I'm also able to use it through the > db.signature.class property, if I pack the class into its own jar and put > it into nutch/lib. I'd much rather like to include it in our existing > plugin jar, though. This is rather strange as Signatures are part of the *core* codebase e.g. /src/java and not /src/plugins. Does this make sense? > I'm not sure what you mean by ".job jar.". If you build the Nutch source, you'll see /runtime/deploy/nutch.XXX.job this is the main artifact sent to deployment clusters (JobTracker). > We have been developing our plugin outside of nutch and placing the > corresponding jars into a plugin directory together with the plugin.xml. Is > there any "magic" happening regarding the classpath when one has ant > building it inside nutch? In general our documentation can be seen here http://wiki.apache.org/nutch/PluginCentral Specifically, you can see here http://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading This is why I think it is a bit strange that you've implemented your signature as a plugin and not as part of the core codebase. > Is there a naming convention regarding the plugin name and corresponding > jar? Do they have to match? > For plugins, accompanying and required files and naming conventions please see http://wiki.apache.org/nutch/WritingPluginExample > > The reason behind developing our plugin outside of nutch and decoupling > the build environment is to make updates of nutch easier. That way we can > simply download the binary release and overlay our plugin. I realize now > this seems to be a little off the usual way of writing plugins for nutch. > I understand that and I would say it makes sense, however as I said before Signatures are usually part of the core codebase and not implemented as plugins (at least I've never implemented a signature as a plugin). hth Lewis

