Why would DNS local caching work... It only is working if you are going to crawl often the same site ... In which case you are hit by the politeness.
if you have segments with only/mainly different sites it is not/really going to help. So far I have not seen my quad core + 100mb/s + pseudo distributed hadoop going faster than 10 fetch / s... Let me check the DNS and I will tell you. I vote for 100 Fetch/s not sure how to get it though 2009/11/24, Dennis Kubes <ku...@apache.org>: > Hi Mark, > > I just put this up on the wiki. Hope it helps: > > http://wiki.apache.org/nutch/OptimizingCrawls > > Dennis > > > Mark Kerzner wrote: >> Hi, guys, >> >> my goal is to do by crawls at 100 fetches per second, observing, of >> course, >> polite crawling. But, when URLs are all different domains, what >> theoretically would stop some software from downloading from 100 domains >> at >> once, achieving the desired speed? >> >> But, whatever I do, I can't make Nutch crawl at that speed. Even if it >> starts at a few dozen URLs/second, it slows down at the end (as discussed >> by >> many and by Krugler). >> >> Should I write something of my own, or are their fast crawlers? >> >> Thanks! >> >> Mark >> > -- Envoyé avec mon mobile -MilleBii-