Jian He commented on YARN-2884:

Looks good to me overall, I think there are still some problems with the 
AMRMProxyToken implementation. Basically, long running service may not work 
with the AMRMProxy.

1) below code in DefaultRequestInterceptor should create and return a new 
AMRMProxyToken in the final returned allocate response when needed. Otherwise, 
AM will fail to talk with AMRMTokenProxy after the key is rolled over in the 
  public AllocateResponse allocate(AllocateRequest request)
      throws YarnException, IOException {
    if (LOG.isDebugEnabled()) {
      LOG.debug("Forwarding allocate request to the real YARN RM");
    AllocateResponse allocateResponse = rmClient.allocate(request);
    if (allocateResponse.getAMRMToken() != null) {
    return allocateResponse; <====
 Below code in ApplicationMasterService#allocate shows how that is done.
      if (nextMasterKey != null
          && nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
            .getKeyId()) {
        RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;
        Token<AMRMTokenIdentifier> amrmToken = appAttempt.getAMRMToken();
        if (nextMasterKey.getMasterKey().getKeyId() !=
            appAttemptImpl.getAMRMTokenKeyId()) {
          LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back"
              + " to application: " + applicationId);
          amrmToken = rmContext.getAMRMTokenSecretManager()
          .newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
            .toString(), amrmToken.getPassword(), amrmToken.getService()
2)  Some methods inside the AMRMProxyTokenSecretManager are not used at all. we 
may remove them ?

3) I think we need at least 1 end-to-end test for this. We can use 
MiniYarnCluster to simulate the whole thing. AM  talks with AMRMProxy which  
talks with RM to register/allocate/finish. In the test, we should also reduce 
the RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS so that we can simulate the 
token renew behavior.  I'm ok to have a separate jira to track the end-to-end 
test, as this is a bit of work.

> Proxying all AM-RM communications
> ---------------------------------
>                 Key: YARN-2884
>                 URL: https://issues.apache.org/jira/browse/YARN-2884
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Carlo Curino
>            Assignee: Kishore Chaliparambil
>         Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs

This message was sent by Atlassian JIRA

Reply via email to