Ted You could try with the fairscheduler as well. See a comment I made a few hours ago on the same subject
From: German Florez-Larrahondo [mailto:[email protected]] Sent: Thursday, January 09, 2014 8:23 AM To: [email protected] Subject: RE: Distributing the code to multiple nodes Ashish Could this be related to the scheduler you are using and its settings?. On lab environments when running a single type of job I often use FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a good job distributing the load. You could give that a try (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairSch eduler.html) I think just changing yarn-site.xml as follows could demonstrate this theory (note that how the jobs are scheduled depend on resources such as memory on the nodes and you would need to setup yarn-site.xml accordingly). <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche duler</value> </property> Regards ./g From: Ted Yu [mailto:[email protected]] Sent: Thursday, January 09, 2014 11:00 AM To: [email protected] Subject: Re: expressing job anti-affinity in Yarn. See: YARN-1042 add ability to specify affinity/anti-affinity in container requests On Thu, Jan 9, 2014 at 8:48 AM, ricky l <[email protected]> wrote: Hi all, Is it possible to express the job anti-affinity in the Yarn-based hadoop? I have a job that is very IO-intensive, and I want to spread the tasks across all available machines. In a default Yarn RM scheduler, it seems many tasks are scheduled in one machine while others are idle. thanks.
