Re: Fetcher hung on final hurdle - continue?

Andrzej Bialecki Fri, 08 Dec 2006 02:22:33 -0800

Robin Haswell wrote:

On Fri, 2006-12-08 at 11:01 +0100, Andrzej Bialecki wrote:
Ad 1.
I suspect that it's sorting the reduce output now ... in 0.8.x thisoperation has poor performance, especially when run on a single server.So, I advise patience, and giving as much CPU and RAM as possible. Forthe future, it's also much much better to run the fetcher in non-parsingmode and run "nutch parse" afterwards as a separate step.
Okay, I'll give it a while and see what happens. Is it possible to get
any information on what's going on? I'm running 0.8 pretty much
out-of-the-box on a single server. I've seen people mentioning phases of
Hadoop - can it tell me what's going on?

This should be shown in the logs - the map xx% or reduce xx% progress isprinted to the logs.

The reduce phase consists of copying map outputs (reduce 0-33%), thensorting them - and here's where most CPU and disk IO and time is spent -which happens between 33%-66%, and finally copying sorted outputs toform the final result.

You can also do a kill -SIGQUIT <pid> to get a thread dump - you will beable to see what the threads are really doing.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Fetcher hung on final hurdle - continue?

Reply via email to