[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639740#comment-14639740 ]
Subru Krishnan commented on YARN-2884: -------------------------------------- To give more context on the approach we took, please find below the summary of the offline discussions we had with [~kishorch], [~jianhe],[~leftnoteasy], [~zjshen], [~kkaranasos],[~chris.douglas]. One of the main drivers for the discussion was whether AMRMProxy service needs to be man-in-the-middle between RM and NM in order for sucessful SASL handshake. On investigation we realized that it was necessary for us to swap the AMRMToken as AM would register with the AMRMProxy service instead of the RM & we need to validate the AMRMToken. To achieve this we need either the RM’s secret key or generate & swap AMRMToken in the AMRMProxy and we went for the latter approach for obvious reasons. We considered a few options to plug in AMRMProxy to the NM: · Adding AMRMProxy as an auxiliary service: This looked the minimally invasive method but AMRMProxy requires access to NM state (SecretManager for generating local AMRMTokens, StateStore for persisting/recovering across NM restarts without killing the AM, etc). We want to isolate aux services from the NM and hence do not want to provide access to internal states. · Making the NM ContainerManager pluggable and implementing AMRMProxy as a custom ContainerManager that extends the default ContainerManagerImpl: This would give us all the leverage needed to implement the AMRMProxy, i.e. access to the NM context, ability to man-in-the-middle container lifecycle events, etc. But this would increase the complexity of the already heavy ContainerManager as we plan to support multiple handlers like Federation (YARN-2915), distributed scheduling (YARN-2877) in the AMRMProxy. Additionally we want to retain the flexibility of deploying AMRMProxy as an independent daemon in the future. So the final approach we decided was to plug in AMRMProxy as an independent first class service in the NM and have a flag to enable/disable it. We added an AM container pre-start hook in the ContainerManager where we swap the AMRMToken issued by the RM with one issued locally by the AMRMProxy. On receiving the register application call, AMRMProxy swaps back the original token issued by RM and forwards the request. > Proxying all AM-RM communications > --------------------------------- > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Reporter: Carlo Curino > Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)