Re: Fetcher hung on final hurdle - continue?

Andrzej Bialecki Fri, 08 Dec 2006 02:01:45 -0800

Robin Haswell wrote:

Hi there


My fetch process must have nearly finished, and now it's slaying the
server. I have a horrible feeling it's hung. I have the parse option
enabled in the configuration so it could be doing that - I've fetched a
lot of documents (it took 1 week at 250KB/s)

I guess I have two questions:

1. Am I going to have to kill the fetch and start again? It's running at
100% CPU and 68% memory - this has only spiked when the debug messages
about fetching ceased appearing, and it's taken a few hours after
finishing to get to this CPU and memory usage

2. Can I resume an unfinished fetch where I left off before? If I have
to kill this I can't bear the thought of waiting another week to fetch.


Ad 1.

I suspect that it's sorting the reduce output now ... in 0.8.x thisoperation has poor performance, especially when run on a single server.So, I advise patience, and giving as much CPU and RAM as possible. Forthe future, it's also much much better to run the fetcher in non-parsingmode and run "nutch parse" afterwards as a separate step.

If you run with disk mounted in the default mode, you may try to changeit on the fly to "async,noatime", check the "mount" man page for detailshow to do this on a live system. Of course, this has the price that ifthe system crashes then you are likely to lose a lot more data ...


Ad 2.

Unfortunately, it's not possible for now to keep partial results.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Fetcher hung on final hurdle - continue?

Reply via email to