Here is how I am getting conf ...
Configuration conf = NutchConfiguration.create();
BufferedReader br = new BufferedReader(conf.getConfResourceAsReader("data"
));
System.out.println(br.ready());
On Thu, Jun 29, 2017 at 4:33 PM, SJC Multimedia <[email protected]>
wrote:
> Thanks Lewis and Jorge. Thanks for all the pointers.
>
> Very helpful as I feel I am almost there in getting it working.
>
> When I run it in local mode then I am able to get the dictionary working
> but on Hadoop it still fails with NPE.
>
> java.lang.NullPointerException
> at java.io.FilterInputStream.available(FilterInputStream.java:168)
> at sun.nio.cs.StreamDecoder.inReady(StreamDecoder.java:362)
> at sun.nio.cs.StreamDecoder.implReady(StreamDecoder.java:370)
> at sun.nio.cs.StreamDecoder.ready(StreamDecoder.java:184)
> at java.io.InputStreamReader.ready(InputStreamReader.java:195)
> at java.io.BufferedReader.ready(BufferedReader.java:456)
> at
> org.apache.nutch.parse.html.db.docscience.JarFileProvider.open(JarFileProvider.java:214)
>
> Line where it fails:
>
> BufferedReader br = new BufferedReader(conf.getConfResourceAsReader("data"
> ));
> data is the directory name under conf folder.
>
> best
> Dave
>
> On Thu, Jun 29, 2017 at 9:26 AM, lewis john mcgibbney <[email protected]>
> wrote:
>
>> Hi Dave,
>> Does this need to be done in parsing phase? Parsing is already an IO
>> intensive process... could you possible do it at another phase?
>> Right now, the only plugin I can think of which ships with Nutch source,
>> and which consults an external resource (not packaged with Nutch) is the
>> index-geoip plugin [0]. This works in distributed mode.
>> Please also consider looking into the parsefilter-naivebayes [1] which
>> loads in a prebuild model [2] as a resource which is then obviously used
>> the filtering.
>> hth
>> Lewis
>>
>> [0] https://github.com/apache/nutch/tree/master/src/plugin/index-geoip
>> [1]
>> https://github.com/apache/nutch/tree/master/src/plugin/parse
>> filter-naivebayes
>> [2]
>> https://github.com/apache/nutch/blob/master/src/plugin/parse
>> filter-naivebayes/src/java/org/apache/nutch/parsefilter/
>> naivebayes/NaiveBayesParseFilter.java#L132-L137
>>
>> On Thu, Jun 29, 2017 at 8:29 AM, <[email protected]>
>> wrote:
>>
>> >
>> >
>> > From: SJC Multimedia <[email protected]>
>> > To: [email protected]
>> > Cc:
>> > Bcc:
>> > Date: Thu, 29 Jun 2017 08:28:54 -0700
>> > Subject: Custom Plugin Resources Files
>> > I am building a custom plugin in Nutch 2.3.1 on Hadoop/HBase. In the
>> plugin
>> > code, I need to pull in a dictionary of files and run some comparisons
>> > while parsing the document.
>> >
>> > Is there a way to include directory of files through the custom plugin
>> ant
>> > build framework that will work on both local and cluster(hadoop MR)
>> mode?
>> >
>> > Any pointers will be helpful.
>> >
>> > Thanks
>> > Dave
>> >
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>
>