If you are run nutch on hadoop cluster, the logs corresponding to each
mapper and reducer of each phase.


On Mon, May 5, 2014 at 7:33 PM, chethan <[email protected]> wrote:

> Also, I'm not able to see any logs generated by the plugin or Nutch base
> classes. There are lots of Hadoop logs, but none from Nutch. Any idea what
> could be the case?
>
> Regards,
>
> --
> Chethan Prasad
>
>
> On Mon, May 5, 2014 at 12:14 PM, chethan <[email protected]> wrote:
>
> > Thanks Feng and Julien for your replies. I will take a look at both
> > options and update what worked.
> >
> > Regards,
> >
> > --
> > Chethan Prasad
> >
> >
> > On Mon, May 5, 2014 at 12:10 AM, Julien Nioche <
> > [email protected]> wrote:
> >
> >> Chethan
> >>
> >> Have a look at Behemoth [https://github.com/DigitalPebble/behemoth] if
> >> you
> >> haven't already done so. Porting the code from the GATE module into an
> >> IndexingFilter should not be too difficult. What we do there is that the
> >> GATE pipeline is stored on HDFS and loaded by the slaves via the
> >> distributed cache.
> >>
> >> Alternatively you could use the Nutch just for crawling then use the
> Nutch
> >> and GATE modules of Behemoth as well as the SOLR or ElasticSearch ones
> if
> >> that's what you want to do.
> >>
> >> HTH
> >>
> >> Julien
> >>
> >>
> >> On 4 May 2014 06:52, chethan <[email protected]> wrote:
> >>
> >> > I have setup Nutch to crawl on Amazon EMR and I have a plugin that
> >> > uses GATE<https://gate.ac.uk/> for
> >> > text processing in the Indexing filters. GATE requires certain static
> >> > resources (some xmls and text files) to be loaded for it to be
> >> initialized.
> >> > I tried to bundle these resources in the job jar and load them from
> the
> >> > classpath but that didn't work. I also tried copying them to HDFS and
> >> > loading them from there but that too failed.
> >> >
> >> > What is the best way to bundle such static resources and reference
> them
> >> in
> >> > the Indexing filters? I am working on copying the file to the
> >> distributed
> >> > cache and loading it from there but I wanted to know how others are
> >> > handling this. Thanks.
> >> >
> >> > Regards,
> >> >
> >> > --
> >> > Chethan Prasad
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >> http://twitter.com/digitalpebble
> >>
> >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to