Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler would allow you to do this if you used appropriate memory based requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2 (depending on the YARN scheduler resource request limits config) you can request your job be run with the maximum-most requests that would soak up all provided resources (of CPU and Memory) of a node such that only one container runs on a host at any given time.
On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <[email protected]> wrote: > I'm running a program which in the streaming layer automatically multithreads > and does so by automatically detecting the number of cores on the machine. I > realize this model is somewhat in conflict with Hadoop, but nonetheless, > that's what I'm doing. Thus, for even resource utilization, it would be nice > to not only assign one mapper per core, but only one mapper per machine. I > realize that if I saturate the cluster none of this really matters, but > consider the following example for clarity: 4-core nodes, 10-node cluster, > thus 40 slots, fully configured across mappers and reducers (40 slots of > each). Say I run this program with just two mappers. It would run much more > efficiently (in essentially half the time) if I could force the two mappers > to go to slots on two separate machines instead of running the risk that > Hadoop may assign them both to the same machine. > > Can this be done? > > Thanks. > > ________________________________________________________________________________ > Keith Wiley [email protected] keithwiley.com > music.keithwiley.com > > "Yet mark his perfect self-contentment, and hence learn his lesson, that to be > self-contented is to be vile and ignorant, and that to aspire is better than > to > be blindly and impotently happy." > -- Edwin A. Abbott, Flatland > ________________________________________________________________________________ > -- Harsh J
