Thanks for the responses, and yes, in my case, parsing IS enabled and
happens during the fetch job.
On 06/13/2012 07:43 AM, Ferdy Galema wrote:
You're right. I was already assuming parsing was enabled. If it's not,
normalizing and filtering is most likely the next probable cause why tasks
are stalling.
On Wed, Jun 13, 2012 at 4:36 PM, Julien Nioche <
[email protected]> wrote:
unless the parsing is activated in the fetch step - this is likely to be a
different issue e.g. normalization of URL taking forever or something like
this. Use jstack to see what the problem is
On 13 June 2012 12:36, Ferdy Galema <[email protected]> wrote:
I'd like to add that I've recently opened an issue that describes one of
the causes of this problem. Look for the lazy man's profiler trick to see
stacktraces of the slow parser task. It will give an indication which
parser code is stalling:
https://issues.apache.org/jira/browse/NUTCH-1387
On Wed, Jun 13, 2012 at 12:40 PM, Lewis John Mcgibbney <
[email protected]> wrote:
Hi kaveh,
We have recently been informed about parsing taking forever and a day
in the reduce phase. This is currently being investigated. FYI the
thread can be found below
http://www.mail-archive.com/user%40nutch.apache.org/msg06560.html
I wonder if you have looked into this and if there is a more general
link between such issues?
Lewis
On Wed, Jun 13, 2012 at 1:31 AM, kaveh minooie <[email protected]>
wrote:
Hi everybody
I have an unusual issue. when i run nutch on top off hadoop, after
the
map
tasks finish, the reduce task start to finish very fast almost all of
them
finish in less than 2 hours but there is alway one or two that take a
lot
longer. this is a link to the list of a completed reduce tasks ( that
is
all
of them for that fetch job) and you can see on the list that the last
one
took more than 18 hours to finish and there is another one that took
more
than 6 hours. does any body have any idea why this is happening?
http://plutooz.com/hadoop.html
p.s. this fetch job had about 1.5 million pages in it.
thanks,
--
Lewis
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble
--
Kaveh Minooie
www.plutoz.com