Dennis Kubes wrote:
True the numFetchers wouldn't be needed there, was just trying to illustrate.

Although I have never used it myself (never needed to because of its default behavior), I guess it could be used if you want only one machine to fetch all of the urls you could do a -numFetchers 1.

There are other reasons, too. If you have a cluster with limited capacity (e.g. 10 map slots) and you still want to run other jobs while the fetcher is running, you may specify -numFetchers 2, then you keep 8 map slots available for other jobs.

Another situation: presumably your config specifies the default number of map tasks equal to the cluster capacity, so when you start a fetch job it allocates all map slots. However, if you run some heavy plugins inside the Fetcher context (urlfilters, parsers, etc), you may want to limit the maximum amount of data in a map task by creating more map tasks than necessary.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to