Hi Dave, Does this need to be done in parsing phase? Parsing is already an IO intensive process... could you possible do it at another phase? Right now, the only plugin I can think of which ships with Nutch source, and which consults an external resource (not packaged with Nutch) is the index-geoip plugin [0]. This works in distributed mode. Please also consider looking into the parsefilter-naivebayes [1] which loads in a prebuild model [2] as a resource which is then obviously used the filtering. hth Lewis
[0] https://github.com/apache/nutch/tree/master/src/plugin/index-geoip [1] https://github.com/apache/nutch/tree/master/src/plugin/parsefilter-naivebayes [2] https://github.com/apache/nutch/blob/master/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java#L132-L137 On Thu, Jun 29, 2017 at 8:29 AM, <[email protected]> wrote: > > > From: SJC Multimedia <[email protected]> > To: [email protected] > Cc: > Bcc: > Date: Thu, 29 Jun 2017 08:28:54 -0700 > Subject: Custom Plugin Resources Files > I am building a custom plugin in Nutch 2.3.1 on Hadoop/HBase. In the plugin > code, I need to pull in a dictionary of files and run some comparisons > while parsing the document. > > Is there a way to include directory of files through the custom plugin ant > build framework that will work on both local and cluster(hadoop MR) mode? > > Any pointers will be helpful. > > Thanks > Dave > > -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney

