unless the parsing is activated in the fetch step - this is likely to be a
different issue e.g. normalization of URL taking forever or something like
this. Use jstack to see what the problem is

On 13 June 2012 12:36, Ferdy Galema <[email protected]> wrote:

> I'd like to add that I've recently opened an issue that describes one of
> the causes of this problem. Look for the lazy man's profiler trick to see
> stacktraces of the slow parser task. It will give an indication which
> parser code is stalling:
> https://issues.apache.org/jira/browse/NUTCH-1387
>
> On Wed, Jun 13, 2012 at 12:40 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi kaveh,
> >
> > We have recently been informed about parsing taking forever and a day
> > in the reduce phase. This is currently being investigated. FYI the
> > thread can be found below
> >
> > http://www.mail-archive.com/user%40nutch.apache.org/msg06560.html
> >
> > I wonder if you have looked into this and if there is a more general
> > link between such issues?
> >
> > Lewis
> >
> > On Wed, Jun 13, 2012 at 1:31 AM, kaveh minooie <[email protected]> wrote:
> > > Hi everybody
> > >
> > > I have an unusual issue. when i run nutch on top off hadoop, after the
> > map
> > > tasks finish, the reduce task start to finish very fast almost all of
> > them
> > > finish in less than 2 hours but there is alway one or two that take a
> lot
> > > longer. this is a link to the list of a completed reduce tasks ( that
> is
> > all
> > > of them for that fetch job) and you can see on the list that the last
> one
> > > took more than 18 hours to finish and there is another one that took
> more
> > > than 6 hours. does any body have any idea why this is happening?
> > >
> > > http://plutooz.com/hadoop.html
> > >
> > > p.s. this fetch job had about 1.5 million pages in it.
> > >
> > > thanks,
> >
> >
> >
> > --
> > Lewis
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to