Robin Haswell wrote:
Hi there
My fetch process must have nearly finished, and now it's slaying the
server. I have a horrible feeling it's hung. I have the parse option
enabled in the configuration so it could be doing that - I've fetched a
lot of documents (it took 1 week at 250KB/s)
I guess I have two questions:
1. Am I going to have to kill the fetch and start again? It's running at
100% CPU and 68% memory - this has only spiked when the debug messages
about fetching ceased appearing, and it's taken a few hours after
finishing to get to this CPU and memory usage
2. Can I resume an unfinished fetch where I left off before? If I have
to kill this I can't bear the thought of waiting another week to fetch.
Ad 1.
I suspect that it's sorting the reduce output now ... in 0.8.x this
operation has poor performance, especially when run on a single server.
So, I advise patience, and giving as much CPU and RAM as possible. For
the future, it's also much much better to run the fetcher in non-parsing
mode and run "nutch parse" afterwards as a separate step.
If you run with disk mounted in the default mode, you may try to change
it on the fly to "async,noatime", check the "mount" man page for details
how to do this on a live system. Of course, this has the price that if
the system crashes then you are likely to lose a lot more data ...
Ad 2.
Unfortunately, it's not possible for now to keep partial results.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com