Hi All,

I am crawling multiple big websites for which I have the homepage as the
URL in the seed file. The problem I am facing is that one of the websites
is getting crawled at a faster pace than the rest of the websites and as a
result the indexed data contains a disproportionate number of entries for
this one website.

I suspect that this is happening because this website in question has
homepage with the maximum number of outlinks.

My questions is how can I control the behaviour of Nutch so as to crawl
every host/domain in a balanced way.

I am using Nutch 1.7

Thanks.

Reply via email to