[
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716153#comment-14716153
]
Jian He commented on YARN-2884:
-------------------------------
Looks good to me overall, I think there are still some problems with the
AMRMProxyToken implementation. Basically, long running service may not work
with the AMRMProxy.
1) below code in DefaultRequestInterceptor should create and return a new
AMRMProxyToken in the final returned allocate response when needed. Otherwise,
AM will fail to talk with AMRMTokenProxy after the key is rolled over in the
AMRMTokenProxySecretManager.
{code}
@Override
public AllocateResponse allocate(AllocateRequest request)
throws YarnException, IOException {
if (LOG.isDebugEnabled()) {
LOG.debug("Forwarding allocate request to the real YARN RM");
}
AllocateResponse allocateResponse = rmClient.allocate(request);
if (allocateResponse.getAMRMToken() != null) {
updateAMRMToken(allocateResponse.getAMRMToken());
}
return allocateResponse; <====
}
{code}
Below code in ApplicationMasterService#allocate shows how that is done.
{code}
if (nextMasterKey != null
&& nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
.getKeyId()) {
RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;
Token<AMRMTokenIdentifier> amrmToken = appAttempt.getAMRMToken();
if (nextMasterKey.getMasterKey().getKeyId() !=
appAttemptImpl.getAMRMTokenKeyId()) {
LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back"
+ " to application: " + applicationId);
amrmToken = rmContext.getAMRMTokenSecretManager()
.createAndGetAMRMToken(appAttemptId);
appAttemptImpl.setAMRMToken(amrmToken);
}
allocateResponse.setAMRMToken(org.apache.hadoop.yarn.api.records.Token
.newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
.toString(), amrmToken.getPassword(), amrmToken.getService()
.toString()));
}
{code}
2) Some methods inside the AMRMProxyTokenSecretManager are not used at all. we
may remove them ?
3) I think we need at least 1 end-to-end test for this. We can use
MiniYarnCluster to simulate the whole thing. AM talks with AMRMProxy which
talks with RM to register/allocate/finish. In the test, we should also reduce
the RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS so that we can simulate the
token renew behavior. I'm ok to have a separate jira to track the end-to-end
test, as this is a bit of work.
> Proxying all AM-RM communications
> ---------------------------------
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
> Reporter: Carlo Curino
> Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch,
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch,
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch,
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per
> rack). Upon start the AM is forced (via tokens and configuration) to direct
> all its requests to a new services running on the NM that provide a proxy to
> the central RM.
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)