Stefan, We have seen the crawler crashing, but never been able to pin-point why. We made a "brute-force" (read very non-elegant) workaround. A script runs just before the fetcher removing all the domains that were unreachable/blocked in the last few days and populates the DNS with entries that are good -- this stopped crashes and cut crawl time by half.
Given that we don't use the WebDB anymore it's a very specific solution but one that has proved to be successful. Maybe someone can come up with a more elegant solution based on our collective experience. -----Original Message----- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 4:20 AM To: [email protected] Subject: dns lookup cache? Hi there, does anyhow nutch cache dns lookups. I found this paper and section 3.7 gives some very interesting information. We notice that our crawlers often crash after a set of unknown host exceptions. We have already one dual cpu box with a 1Gbit network connection running BIND. So I have 2 questions: People think is may java domain lookup may be a bottleneck that crashs the crawlers? Other crawlers have a kind of dns cache would that make sense to introduce it to nutch as well? Thanks for any comments. Stefan ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
