Stefan,

We have seen the crawler crashing, but never been able to pin-point why. We
made a "brute-force" (read very non-elegant) workaround. A script runs just
before the fetcher removing all the domains that were unreachable/blocked in
the last few days and populates the DNS with entries that are good  -- this
stopped crashes and cut crawl time by half. 

Given that we don't use the WebDB anymore it's a very specific solution but
one that has proved to be successful. Maybe someone can come up with a more
elegant solution based on our collective experience.


-----Original Message-----
From: Stefan Groschupf [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 03, 2005 4:20 AM
To: [email protected]
Subject: dns lookup cache?

Hi there,
does anyhow nutch cache dns lookups.
I found this paper and section 3.7 gives some very interesting information.
We notice that our crawlers often crash after a set of unknown host
exceptions.
We have already one dual cpu box with a 1Gbit network connection running
BIND.

So I have 2 questions:
People think is may java domain lookup may be a bottleneck that crashs the
crawlers?
Other crawlers have a kind of dns cache would that make sense to introduce
it to nutch as well?

Thanks for any comments.
Stefan





-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to