Hi Dave,
Does this need to be done in parsing phase? Parsing is already an IO
intensive process... could you possible do it at another phase?
Right now, the only plugin I can think of which ships with Nutch source,
and which consults an external resource (not packaged with Nutch) is the
index-geoip plugin [0]. This works in distributed mode.
Please also consider looking into the parsefilter-naivebayes [1] which
loads in a prebuild model [2] as a resource which is then obviously used
the filtering.
hth
Lewis

[0] https://github.com/apache/nutch/tree/master/src/plugin/index-geoip
[1]
https://github.com/apache/nutch/tree/master/src/plugin/parsefilter-naivebayes
[2]
https://github.com/apache/nutch/blob/master/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java#L132-L137

On Thu, Jun 29, 2017 at 8:29 AM, <[email protected]> wrote:

>
>
> From: SJC Multimedia <[email protected]>
> To: [email protected]
> Cc:
> Bcc:
> Date: Thu, 29 Jun 2017 08:28:54 -0700
> Subject: Custom Plugin Resources Files
> I am building a custom plugin in Nutch 2.3.1 on Hadoop/HBase. In the plugin
> code, I need to pull in a dictionary of files and run some comparisons
> while parsing the document.
>
> Is there a way to include directory of files through the custom plugin ant
> build framework that will work on both local and cluster(hadoop MR) mode?
>
> Any pointers will be helpful.
>
> Thanks
> Dave
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Reply via email to