I had ran crawls with topN as large as 4 million while having crawldb of
~80 GB. It worked fine without any such issue.
Maybe the hardware / cluster you have is not capable of handling load above
500. Note that if topN is low, then no matter how many fetcher threads you
create, you wont be able to increase #crawls. Also, as there is a
considerable amount of time spent in generate and update phase, overall
crawl rate will be low. If you are planning to use the same machine, you
will have to work with lower values (and thus expect lower crawl rate).

thanks,
Tejas Patil


On Wed, Jan 30, 2013 at 8:06 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> You are not getting very many URLs!
>
> On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto <[email protected]
> >wrote:
>
> >
> > 2013-01-29 08:44:35,014 INFO  crawl.CrawlDbReader - TOTAL urls: 96404
> >
> > 2013-01-29 08:44:35,018 INFO  crawl.CrawlDbReader - status 1
> > (db_unfetched):
> > 85672
> >
>

Reply via email to