Hi,
I am using Nutch to index about 1 million static HTML pages on a
single server on my LAN, using a cluster of ~20 machines. However,
whenever I perform a fetch, Nutch only uses two map workers despite
the fact that there are 20 in the cluster and ends up giving 90% of
the pages to one of them. For example, I created a fetchlist of 10,000
pages and ended up with one mapper fetching 175 of them and one
fetching 9000. What can I do to use more mappers and partition the
load more evenly? My web server should be able to handle more
connections at once.
Thanks,
Matei Zaharia
- Fetching many pages off LAN Matei Zaharia
-