Hi Hari, My response inline.
On Thu, Apr 11, 2013 at 11:51 AM, hari <[email protected]> wrote: > Hi list, > > I was looking at the node communication (in hadoop-2.0.3-alpha) > responsible for task assignment. So > far I saw that the resourcemanager and nodemanager > exchange heartbeat request/response. Very glad you're checking out YARN. Do provide feedback on what we can improve over the YARN JIRA when you get to try it! > I have a couple of questions on this: > > 1. I saw that the heartbeat response to nodemanager does > not include new task assignment. I think in the previous > versions (eg. 0.20.2) the task assignment was piggybacked in the > heartbeat response to the tasktracker. In the v2.0.3, so far > I could only see that the response can trigger nodemanager > shutdown, reboot, app cleanup, and container cleanup. Are there > any other actions being triggered on the nodemanager > by the heartbeat response ? Note: New term instead of 'Task', in YARN, is 'Container'. Yes, the heartbeats between the NodeManager and the ResourceManager does not account for container assignments anymore. Container launches are handled by a separate NodeManager-embedded service called the ContainerManager [1]. You can read all the functions a heartbeat currently does at a NodeManager under its service NodeStatusUpdater code at [2]. > 2. Is there a different communication path for task assignment ? > Is the scheduler making the remote calls or are there other classes outside > of yarn responsible for making the remote calls ? The latter. Scheduler no longer is responsible for asking NodeManagers to launch the containers. An ApplicationMaster just asks Scheduler to reserve it some containers with specified resource (Memory/CPU/etc.) allocations on available or preferred NodeManagers, and then once the ApplicationMaster gets a response back that the allocation succeeded, it manually communicates with ContainerManager on the allocated NodeManager, to launch the command needed to run the 'task'. A good-enough example to read would be the DistributedShell example. I've linked [3] to show where the above AppMaster -> ContainerManager requesting happens in its ApplicationMaster implementation, which should help clear this for you. [1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L392 [startContainer, stopContainer, getContainerStatus etc… are all protocol-callable calls from an Application Master] [2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java#L433 [Link is to the heartbeat retry loop, and then the response processing follows right after the loop] [3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L756 [See the whole method, i.e. above from this code line point which is just the final RPC call, and you can notice how we build the command/environment to launch our 'task'] -- Harsh J
