I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine. I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing. Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine. I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each). Say I run this program with just two mappers. It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
Can this be done? Thanks. ________________________________________________________________________________ Keith Wiley [email protected] keithwiley.com music.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________
