Hi Tejas, I am currently running nutch 1.6 on windows 7, pentium dual core 2.8Ghz, 2 GB ram I will be using amazon ec2 servers later for crawling.
What was ur hardware when you ran 4 million urls with 80Gb data? Will nutch 2.1 give a faster crawl speed than 1.6? Tejas Patil wrote > I had ran crawls with topN as large as 4 million while having crawldb of > ~80 GB. It worked fine without any such issue. > Maybe the hardware / cluster you have is not capable of handling load > above > 500. Note that if topN is low, then no matter how many fetcher threads you > create, you wont be able to increase #crawls. Also, as there is a > considerable amount of time spent in generate and update phase, overall > crawl rate will be low. If you are planning to use the same machine, you > will have to work with lower values (and thus expect lower crawl rate). > > thanks, > Tejas Patil > > > On Wed, Jan 30, 2013 at 8:06 PM, Lewis John Mcgibbney < > lewis.mcgibbney@ >> wrote: > >> You are not getting very many URLs! >> >> On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto < > peterbarretto08@ > > >wrote: >> >> > >> > 2013-01-29 08:44:35,014 INFO crawl.CrawlDbReader - TOTAL urls: 96404 >> > >> > 2013-01-29 08:44:35,018 INFO crawl.CrawlDbReader - status 1 >> > (db_unfetched): >> > 85672 >> > >> -- View this message in context: http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4037637.html Sent from the Nutch - User mailing list archive at Nabble.com.

