I had ran crawls with topN as large as 4 million while having crawldb of ~80 GB. It worked fine without any such issue. Maybe the hardware / cluster you have is not capable of handling load above 500. Note that if topN is low, then no matter how many fetcher threads you create, you wont be able to increase #crawls. Also, as there is a considerable amount of time spent in generate and update phase, overall crawl rate will be low. If you are planning to use the same machine, you will have to work with lower values (and thus expect lower crawl rate).
thanks, Tejas Patil On Wed, Jan 30, 2013 at 8:06 PM, Lewis John Mcgibbney < [email protected]> wrote: > You are not getting very many URLs! > > On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto <[email protected] > >wrote: > > > > > 2013-01-29 08:44:35,014 INFO crawl.CrawlDbReader - TOTAL urls: 96404 > > > > 2013-01-29 08:44:35,018 INFO crawl.CrawlDbReader - status 1 > > (db_unfetched): > > 85672 > > >

