Robin Haswell wrote:
Hi there

My fetch process must have nearly finished, and now it's slaying the
server. I have a horrible feeling it's hung. I have the parse option
enabled in the configuration so it could be doing that - I've fetched a
lot of documents (it took 1 week at 250KB/s)

I guess I have two questions:

1. Am I going to have to kill the fetch and start again? It's running at
100% CPU and 68% memory - this has only spiked when the debug messages
about fetching ceased appearing, and it's taken a few hours after
finishing to get to this CPU and memory usage

2. Can I resume an unfinished fetch where I left off before? If I have
to kill this I can't bear the thought of waiting another week to fetch.

Ad 1.

I suspect that it's sorting the reduce output now ... in 0.8.x this operation has poor performance, especially when run on a single server. So, I advise patience, and giving as much CPU and RAM as possible. For the future, it's also much much better to run the fetcher in non-parsing mode and run "nutch parse" afterwards as a separate step.

If you run with disk mounted in the default mode, you may try to change it on the fly to "async,noatime", check the "mount" man page for details how to do this on a live system. Of course, this has the price that if the system crashes then you are likely to lose a lot more data ...

Ad 2.

Unfortunately, it's not possible for now to keep partial results.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to