I've got the default plugins enabled:

nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)

And kill -QUIT works great, right up until the process stalls. Kill seems to have some issues with the java process in general. The only way to kill a running fetch with numerous threads is to kill -9 it. kill -QUIT works initially, but not after it stalls.

I'll be trying with -noparse and a few different profilers in the next bit to see what happens.

Doug Cutting wrote:
Ken van Mulder wrote:

Initially, its able to reach ~25 pages/s with 150 threads. The fetcher gets progressivly slower though, dropping down to about ~15 pages/s after about 2-3 hours or so and continues to slow down. I've seen a few references on these lists to the issue, but I'm not clear on if its expected behaviour or if there's a solution to it? I've also noticed that the process takes up more and more memory as it runs, is this expected as well?


What parse plugins do you have enabled?

The best way to diagnose these problems is to 'kill -QUIT' an offending fetcher process. This will dump the stack of every fetcher thread. This will likely look quite different at the start of your run than later in the run, and that difference should point to the problem.

In the past I have seen these symptoms primarily with parser plugins. I have also seen threads hang infinitely in a socket read, but that is much rarer.

Doug



--
Ken van Mulder
Wavefire Technologies Corporation

http://www.wavefire.com
250.717.0200 (ext 113)

Reply via email to