Re: very long fetch reduce task

Ferdy Galema Wed, 13 Jun 2012 07:44:12 -0700

You're right. I was already assuming parsing was enabled. If it's not,
normalizing and filtering is most likely the next probable cause why tasks
are stalling.


On Wed, Jun 13, 2012 at 4:36 PM, Julien Nioche <
[email protected]> wrote:

> unless the parsing is activated in the fetch step - this is likely to be a
> different issue e.g. normalization of URL taking forever or something like
> this. Use jstack to see what the problem is
>
> On 13 June 2012 12:36, Ferdy Galema <[email protected]> wrote:
>
> > I'd like to add that I've recently opened an issue that describes one of
> > the causes of this problem. Look for the lazy man's profiler trick to see
> > stacktraces of the slow parser task. It will give an indication which
> > parser code is stalling:
> > https://issues.apache.org/jira/browse/NUTCH-1387
> >
> > On Wed, Jun 13, 2012 at 12:40 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > Hi kaveh,
> > >
> > > We have recently been informed about parsing taking forever and a day
> > > in the reduce phase. This is currently being investigated. FYI the
> > > thread can be found below
> > >
> > > http://www.mail-archive.com/user%40nutch.apache.org/msg06560.html
> > >
> > > I wonder if you have looked into this and if there is a more general
> > > link between such issues?
> > >
> > > Lewis
> > >
> > > On Wed, Jun 13, 2012 at 1:31 AM, kaveh minooie <[email protected]>
> wrote:
> > > > Hi everybody
> > > >
> > > > I have an unusual issue. when i run nutch on top off hadoop, after
> the
> > > map
> > > > tasks finish, the reduce task start to finish very fast almost all of
> > > them
> > > > finish in less than 2 hours but there is alway one or two that take a
> > lot
> > > > longer. this is a link to the list of a completed reduce tasks ( that
> > is
> > > all
> > > > of them for that fetch job) and you can see on the list that the last
> > one
> > > > took more than 18 hours to finish and there is another one that took
> > more
> > > > than 6 hours. does any body have any idea why this is happening?
> > > >
> > > > http://plutooz.com/hadoop.html
> > > >
> > > > p.s. this fetch job had about 1.5 million pages in it.
> > > >
> > > > thanks,
> > >
> > >
> > >
> > > --
> > > Lewis
> > >
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: very long fetch reduce task

Reply via email to