What we were seeing is the dns server cached the addresses in memory
(bind 9x..) and because we were caching so many addresses on a single
dns server it would eat up memory and eventually begin swapping to
disk. When this occurred the server load got up to 1.5 and the iowait
was near 100%. Basically it stalled the box. Requests were still
getting through but it was very slow. Our solution (at least
temporarily was to restart the bind service (not the box just the
daemon) every couple of hours to flush the memory.
As for load on the boxes we are seeing very minimal loads (like .08
loads and no iowait times). We have about 55 fetchers running (5 on
each box with 11 nodes) and right now we are bandwidth bound on a 2Mbps
pipe. So maybe it is just that we don't have enough load on each
machine to see the kind of waits that you are seeing. Is your system
distributed or on a single machine ?
Dennis
Stefan Neufeind wrote:
Hi Dennis,
thank you for the answer. Hmm, could theoretically be. But to prevent
this the server already does resolving completely on his local machine.
Also I wonder about the CPU-load moving to "system" - I suspected heavy
disk-access or so ... but I don't know how/when the fetcher writes data
to disk etc.
Regards,
Stefan
Dennis Kubes wrote:
Is this possibly a dns issue. We are running a 5M page crawl and are
seeing very heavy DNS load. Just a thought.
Dennis
Stefan Neufeind wrote:
Hi,
I've encountered that here nutch is fetching quite a sum or URLs from a
long list (about 25.000). But from time to time nutch is "waiting" for
10 seconds or so. Nothing is locked, but system-load is 99,9% then. Is
nutch writing fetched data or index to disk at that stage? Is there any
way to optimize this step, e.g. by writing more often and performing the
write in "background" or caching even more in mem instead of flushing to
disk?
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general