Hi everyone,

We are preparing to run Nutch on a cluster of 100+ machines (using Hadoop).
At the moment, we use a small cluster of 3 machines to test our setup.

We are trying to figure out the optimal parameters for our environment. 
One of them is the number of map and reduce tasks, configured in 
hadoop-site.xml.

Question: is it true that the number of fetcher map tasks is determined 
by the number of reduce tasks (used by the generator)? This would make 
sense since the n reduce tasks of the generator produce n partitions 
(fetch  lists) which can then be fetched by n fetcher map tasks.

We think that it would be a good idea to define more fetcher map tasks 
than the number of slaves we use. This way, fast slaves can ask the 
jobtracker for more work (tasks) when they become available while slow 
slaves (most likely fetching slow hosts) continue working on their first 
tasks (we assume we have enough unique hosts to fetch).

Does this make sense?

Thanks,
Mathijs











-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to