-in theory this should work- Find the part of hadoop code that calculates the number of cores and patch it to always return one. [?]
On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <[email protected]> wrote: > Yeah, it isn't, not even remotely, but thanks. > > On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote: > > > If this cluster is being used exclusively for this goal, you could just > set the mapred.tasktracker.map.tasks.maximum to 1. > > > > > > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <[email protected]> > wrote: > > I'm running a program which in the streaming layer automatically > multithreads and does so by automatically detecting the number of cores on > the machine. I realize this model is somewhat in conflict with Hadoop, but > nonetheless, that's what I'm doing. Thus, for even resource utilization, > it would be nice to not only assign one mapper per core, but only one > mapper per machine. I realize that if I saturate the cluster none of this > really matters, but consider the following example for clarity: 4-core > nodes, 10-node cluster, thus 40 slots, fully configured across mappers and > reducers (40 slots of each). Say I run this program with just two mappers. > It would run much more efficiently (in essentially half the time) if I > could force the two mappers to go to slots on two separate machines instead > of running the risk that Hadoop may assign them both to the same machine. > > > > Can this be done? > > > > Thanks. > > > ________________________________________________________________________________ > Keith Wiley [email protected] keithwiley.com > music.keithwiley.com > > "Luminous beings are we, not this crude matter." > -- Yoda > > ________________________________________________________________________________ > >
<<360.gif>>
