Hi Tejas,

I am currently running nutch 1.6 on windows 7, pentium dual core 2.8Ghz, 2
GB ram 
I will be using amazon ec2 servers later for crawling. 

What was ur hardware when you ran 4 million urls with 80Gb data?

Will nutch 2.1 give a faster crawl speed than 1.6?


Tejas Patil wrote
> I had ran crawls with topN as large as 4 million while having crawldb of
> ~80 GB. It worked fine without any such issue.
> Maybe the hardware / cluster you have is not capable of handling load
> above
> 500. Note that if topN is low, then no matter how many fetcher threads you
> create, you wont be able to increase #crawls. Also, as there is a
> considerable amount of time spent in generate and update phase, overall
> crawl rate will be low. If you are planning to use the same machine, you
> will have to work with lower values (and thus expect lower crawl rate).
> 
> thanks,
> Tejas Patil
> 
> 
> On Wed, Jan 30, 2013 at 8:06 PM, Lewis John Mcgibbney <

> lewis.mcgibbney@

>> wrote:
> 
>> You are not getting very many URLs!
>>
>> On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto &lt;

> peterbarretto08@

> &gt; >wrote:
>>
>> >
>> > 2013-01-29 08:44:35,014 INFO  crawl.CrawlDbReader - TOTAL urls: 96404
>> >
>> > 2013-01-29 08:44:35,018 INFO  crawl.CrawlDbReader - status 1
>> > (db_unfetched):
>> > 85672
>> >
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4037637.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to