What we were seeing is the dns server cached the addresses in memory (bind 9x..) and because we were caching so many addresses on a single dns server it would eat up memory and eventually begin swapping to disk. When this occurred the server load got up to 1.5 and the iowait was near 100%. Basically it stalled the box. Requests were still getting through but it was very slow. Our solution (at least temporarily was to restart the bind service (not the box just the daemon) every couple of hours to flush the memory. As for load on the boxes we are seeing very minimal loads (like .08 loads and no iowait times). We have about 55 fetchers running (5 on each box with 11 nodes) and right now we are bandwidth bound on a 2Mbps pipe. So maybe it is just that we don't have enough load on each machine to see the kind of waits that you are seeing. Is your system distributed or on a single machine ?

Dennis

Stefan Neufeind wrote:
Hi Dennis,

thank you for the answer. Hmm, could theoretically be. But to prevent
this the server already does resolving completely on his local machine.
Also I wonder about the CPU-load moving to "system" - I suspected heavy
disk-access or so ... but I don't know how/when the fetcher writes data
to disk etc.



Regards,
 Stefan

Dennis Kubes wrote:
Is this possibly a dns issue.  We are running a 5M page crawl and are
seeing very heavy DNS load.  Just a thought.

Dennis

Stefan Neufeind wrote:
Hi,

I've encountered that here nutch is fetching quite a sum or URLs from a
long list (about 25.000). But from time to time nutch is "waiting" for
10 seconds or so. Nothing is locked, but system-load is 99,9% then. Is
nutch writing fetched data or index to disk at that stage? Is there any
way to optimize this step, e.g. by writing more often and performing the
write in "background" or caching even more in mem instead of flushing to
disk?


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to