I've got the default plugins enabled:
nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)
And kill -QUIT works great, right up until the process stalls. Kill
seems to have some issues with the java process in general. The only way
to kill a running fetch with numerous threads is to kill -9 it. kill
-QUIT works initially, but not after it stalls.
I'll be trying with -noparse and a few different profilers in the next
bit to see what happens.
Doug Cutting wrote:
Ken van Mulder wrote:
Initially, its able to reach ~25 pages/s with 150 threads. The fetcher
gets progressivly slower though, dropping down to about ~15 pages/s
after about 2-3 hours or so and continues to slow down. I've seen a
few references on these lists to the issue, but I'm not clear on if
its expected behaviour or if there's a solution to it? I've also
noticed that the process takes up more and more memory as it runs, is
this expected as well?
What parse plugins do you have enabled?
The best way to diagnose these problems is to 'kill -QUIT' an offending
fetcher process. This will dump the stack of every fetcher thread. This
will likely look quite different at the start of your run than later in
the run, and that difference should point to the problem.
In the past I have seen these symptoms primarily with parser plugins. I
have also seen threads hang infinitely in a socket read, but that is
much rarer.
Doug
--
Ken van Mulder
Wavefire Technologies Corporation
http://www.wavefire.com
250.717.0200 (ext 113)