All AuxiliaryServices are configured in yarn-site.xml with a service ID and a class is associated to the defined service ID. For MR2, one generally adds the two below properties, first defining the name (Service ID), second defining the class to launch for the defined name:
<name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> As part of the interface, all AuxiliaryServices may submit back some metadata that they would require clients to be aware of. For the ShuffleHandler, the port is rather important, so it serializes it via the getMeta() interface. [1] As part of any startContainer(…) response from the NodeManager's ContainerManager service, all metadata of all available auxiliary services are shipped back as part of a successful response to a container start request. This is a mapping based on (configured service ID -> metadata) for every Aux service configured and currently running. [2] A client, such as MR2, receives this batch of metadata and deserializes whatever it is looking for [3] using the service ID name string it is aware of [4]. [1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195 and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328 [2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502 [3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165 [4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148 On Wed, May 29, 2013 at 4:13 AM, John Lilley <[email protected]> wrote: > I was reading from the HortonWorks blog: > > “How MapReduce shuffle takes advantage of NM’s Auxiliary-services > > The Shuffle functionality required to run a MapReduce (MR) application is > implemented as an Auxiliary Service. This service starts up a Netty Web > Server, and knows how to handle MR specific shuffle requests from Reduce > tasks. The MR AM specifies the service id for the shuffle service, along > with security tokens that may be required. The NM provides the AM with the > port on which the shuffle service is running which is passed onto the Reduce > tasks.” > > How does the AM get the service ID and the port? > > Thanks, > > John > > -- Harsh J
