If you are run nutch on hadoop cluster, the logs corresponding to each mapper and reducer of each phase.
On Mon, May 5, 2014 at 7:33 PM, chethan <[email protected]> wrote: > Also, I'm not able to see any logs generated by the plugin or Nutch base > classes. There are lots of Hadoop logs, but none from Nutch. Any idea what > could be the case? > > Regards, > > -- > Chethan Prasad > > > On Mon, May 5, 2014 at 12:14 PM, chethan <[email protected]> wrote: > > > Thanks Feng and Julien for your replies. I will take a look at both > > options and update what worked. > > > > Regards, > > > > -- > > Chethan Prasad > > > > > > On Mon, May 5, 2014 at 12:10 AM, Julien Nioche < > > [email protected]> wrote: > > > >> Chethan > >> > >> Have a look at Behemoth [https://github.com/DigitalPebble/behemoth] if > >> you > >> haven't already done so. Porting the code from the GATE module into an > >> IndexingFilter should not be too difficult. What we do there is that the > >> GATE pipeline is stored on HDFS and loaded by the slaves via the > >> distributed cache. > >> > >> Alternatively you could use the Nutch just for crawling then use the > Nutch > >> and GATE modules of Behemoth as well as the SOLR or ElasticSearch ones > if > >> that's what you want to do. > >> > >> HTH > >> > >> Julien > >> > >> > >> On 4 May 2014 06:52, chethan <[email protected]> wrote: > >> > >> > I have setup Nutch to crawl on Amazon EMR and I have a plugin that > >> > uses GATE<https://gate.ac.uk/> for > >> > text processing in the Indexing filters. GATE requires certain static > >> > resources (some xmls and text files) to be loaded for it to be > >> initialized. > >> > I tried to bundle these resources in the job jar and load them from > the > >> > classpath but that didn't work. I also tried copying them to HDFS and > >> > loading them from there but that too failed. > >> > > >> > What is the best way to bundle such static resources and reference > them > >> in > >> > the Indexing filters? I am working on copying the file to the > >> distributed > >> > cache and loading it from there but I wanted to know how others are > >> > handling this. Thanks. > >> > > >> > Regards, > >> > > >> > -- > >> > Chethan Prasad > >> > > >> > >> > >> > >> -- > >> > >> Open Source Solutions for Text Engineering > >> > >> http://digitalpebble.blogspot.com/ > >> http://www.digitalpebble.com > >> http://twitter.com/digitalpebble > >> > > > > > -- Don't Grow Old, Grow Up... :-)

