Thanks that is good to know. Is there any way to say "please fail if I don't get the node I want?" Do I just release the container and try again? I'd like to understand the implications of this policy. Suppose I have 1000 data splits and cluster capacity of 100 containers. If I try to schedule 200 tasks, requesting a local data node for each one, how do I ensure the highest chance that the tasks run against local data? Do I just ask for all 200 at once? Should I ask for 100 at a time and then re-target the remainder as containers come open? Or am I thinking about this all wrong... perhaps I should ask for containers, see what nodes they are on, and then assign the data splits to them once I see the set of available containers? john From: Arun C Murthy [mailto:[email protected]] Sent: Thursday, June 13, 2013 12:27 AM To: [email protected] Subject: Re: container allocation
By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order). We recently added ability to whitelist or blacklist nodes to allow for more control. Arun On Jun 12, 2013, at 8:03 AM, John Lilley wrote: If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node? In other words is the node name a requirement or a hint? Thanks John -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
