unless the parsing is activated in the fetch step - this is likely to be a different issue e.g. normalization of URL taking forever or something like this. Use jstack to see what the problem is
On 13 June 2012 12:36, Ferdy Galema <[email protected]> wrote: > I'd like to add that I've recently opened an issue that describes one of > the causes of this problem. Look for the lazy man's profiler trick to see > stacktraces of the slow parser task. It will give an indication which > parser code is stalling: > https://issues.apache.org/jira/browse/NUTCH-1387 > > On Wed, Jun 13, 2012 at 12:40 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi kaveh, > > > > We have recently been informed about parsing taking forever and a day > > in the reduce phase. This is currently being investigated. FYI the > > thread can be found below > > > > http://www.mail-archive.com/user%40nutch.apache.org/msg06560.html > > > > I wonder if you have looked into this and if there is a more general > > link between such issues? > > > > Lewis > > > > On Wed, Jun 13, 2012 at 1:31 AM, kaveh minooie <[email protected]> wrote: > > > Hi everybody > > > > > > I have an unusual issue. when i run nutch on top off hadoop, after the > > map > > > tasks finish, the reduce task start to finish very fast almost all of > > them > > > finish in less than 2 hours but there is alway one or two that take a > lot > > > longer. this is a link to the list of a completed reduce tasks ( that > is > > all > > > of them for that fetch job) and you can see on the list that the last > one > > > took more than 18 hours to finish and there is another one that took > more > > > than 6 hours. does any body have any idea why this is happening? > > > > > > http://plutooz.com/hadoop.html > > > > > > p.s. this fetch job had about 1.5 million pages in it. > > > > > > thanks, > > > > > > > > -- > > Lewis > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

