Hi,

I am using Nutch to index about 1 million static HTML pages on a single server on my LAN, using a cluster of ~20 machines. However, whenever I perform a fetch, Nutch only uses two map workers despite the fact that there are 20 in the cluster and ends up giving 90% of the pages to one of them. For example, I created a fetchlist of 10,000 pages and ended up with one mapper fetching 175 of them and one fetching 9000. What can I do to use more mappers and partition the load more evenly? My web server should be able to handle more connections at once.

Thanks,

Matei Zaharia

Reply via email to