Why would DNS local caching work... It only is working if you are
going to crawl often the same site ... In which case you are hit by
the politeness.

if you have segments with only/mainly different sites it is not/really
going to help.

So far I have not seen my quad core + 100mb/s + pseudo distributed
hadoop  going faster than 10 fetch / s... Let me check the DNS and I
will tell you.

I vote for 100 Fetch/s not sure how to get it though



2009/11/24, Dennis Kubes <ku...@apache.org>:
> Hi Mark,
>
> I just put this up on the wiki.  Hope it helps:
>
> http://wiki.apache.org/nutch/OptimizingCrawls
>
> Dennis
>
>
> Mark Kerzner wrote:
>> Hi, guys,
>>
>> my goal is to do by crawls at 100 fetches per second, observing, of
>> course,
>> polite crawling. But, when URLs are all different domains, what
>> theoretically would stop some software from downloading from 100 domains
>> at
>> once, achieving the desired speed?
>>
>> But, whatever I do, I can't make Nutch crawl at that speed. Even if it
>> starts at a few dozen URLs/second, it slows down at the end (as discussed
>> by
>> many and by Krugler).
>>
>> Should I write something of my own, or are their fast crawlers?
>>
>> Thanks!
>>
>> Mark
>>
>

-- 
Envoyé avec mon mobile

-MilleBii-

Reply via email to