Hi everyone, We are preparing to run Nutch on a cluster of 100+ machines (using Hadoop). At the moment, we use a small cluster of 3 machines to test our setup.
We are trying to figure out the optimal parameters for our environment. One of them is the number of map and reduce tasks, configured in hadoop-site.xml. Question: is it true that the number of fetcher map tasks is determined by the number of reduce tasks (used by the generator)? This would make sense since the n reduce tasks of the generator produce n partitions (fetch lists) which can then be fetched by n fetcher map tasks. We think that it would be a good idea to define more fetcher map tasks than the number of slaves we use. This way, fast slaves can ask the jobtracker for more work (tasks) when they become available while slow slaves (most likely fetching slow hosts) continue working on their first tasks (we assume we have enough unique hosts to fetch). Does this make sense? Thanks, Mathijs ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general