Yes, big time interest, Breno! Thanks and would appreciate your
contribution. Instructions are here if you use Github:

http://github.com/apache/nutch/#contributing, otherwise, JIRA and
SVN patch would be fine too.

Thanks!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: Breno Faria <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, June 3, 2015 at 1:13 AM
To: "[email protected]" <[email protected]>
Subject: AW: Deduplication -- custom Signature

>Hi Lewis,
>
>Thanks for the explanation!
>
>> I understand that and I would say it makes sense, however as I said
>>before Signatures are usually part of the core codebase and not
>>implemented as plugins (at least I've never implemented a signature as a
>>plugin).
>
>Is there interest in including the domain aware Signature in nutch? I
>would gladly contribute.
>
>Cheers
>
>Breno Faria
>Software Architect – Text Analytics
>Intrafind Software AG
>Tel:      +49 (89) 3090446-26
>Web:    http://www.intrafind.de
>
>
>-----Ursprüngliche Nachricht-----
>Von: Lewis John Mcgibbney [mailto:[email protected]]
>Gesendet: Dienstag, 2. Juni 2015 19:06
>An: [email protected]
>Betreff: Re: Deduplication -- custom Signature
>
>Hi Breno,
>
>On Tue, Jun 2, 2015 at 1:38 AM, <[email protected]> wrote:
>
>>
>> We are indexing several domains for a specific project, which may
>> contain duplicated content (e.g. pdf files). The users of the system
>> come from different organisations and wonder why the content is not
>> appearing under certain domains. It's a usability issue (with a
>>political aftertaste).
>>
>
>Thanks for explanation.
>
>
>>
>> Yes, I extended Signature, and I'm also able to use it through the
>> db.signature.class property, if I pack the class into its own jar and
>> put it into nutch/lib. I'd much rather like to include it in our
>> existing plugin jar, though.
>
>
>This is rather strange as Signatures are part of the *core* codebase e.g.
>/src/java and not /src/plugins. Does this make sense?
>
>
>> I'm not sure what you mean by ".job jar.".
>
>
>If you build the Nutch source, you'll see /runtime/deploy/nutch.XXX.job
>this is the main artifact sent to deployment clusters (JobTracker).
>
>
>> We have been developing our plugin outside of nutch and placing the
>> corresponding jars into a plugin directory together with the
>> plugin.xml. Is there any "magic" happening regarding the classpath
>> when one has ant building it inside nutch?
>
>
>In general our documentation can be seen here
>http://wiki.apache.org/nutch/PluginCentral
>Specifically, you can see here
>http://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading
>This is why I think it is a bit strange that you've implemented your
>signature as a plugin and not as part of the core codebase.
>
>
>
>> Is there a naming convention regarding the plugin name and
>> corresponding jar? Do they have to match?
>>
>
>For plugins, accompanying and required files and naming conventions
>please see http://wiki.apache.org/nutch/WritingPluginExample
>
>
>>
>> The reason behind developing our plugin outside of nutch and
>> decoupling the build environment is to make updates of nutch easier.
>> That way we can simply download the binary release and overlay our
>> plugin. I realize now this seems to be a little off the usual way of
>>writing plugins for nutch.
>>
>
>I understand that and I would say it makes sense, however as I said
>before Signatures are usually part of the core codebase and not
>implemented as plugins (at least I've never implemented a signature as a
>plugin).
>hth
>Lewis

Reply via email to