[
https://issues.apache.org/jira/browse/YARN-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Morty Zhong updated YARN-9610:
------------------------------
Description:
in federation, `allocate` is async. the response from RM is cached in
`asyncResponseSink`.
the final allocate response is merged from all RMs allocate response. merge
will throw exception when AMRMToken from UAM response is not null.
But set AMRMToken from UAM response to null is not in the scope of lock. so
there will be a change merge see that AMRMToken from UAM response is not null.
so we should clear the token before add response to asyncResponseSink
{code:java}
synchronized (asyncResponseSink) {
List<AllocateResponse> responses = null;
if (asyncResponseSink.containsKey(subClusterId)) {
responses = asyncResponseSink.get(subClusterId);
} else {
responses = new ArrayList<>();
asyncResponseSink.put(subClusterId, responses);
}
responses.add(response);
// Notify main thread about the response arrival
asyncResponseSink.notifyAll();
}
...
if (this.isUAM && response.getAMRMToken() != null) {
Token<AMRMTokenIdentifier> newToken = ConverterUtils
.convertFromYarn(response.getAMRMToken(), (Text) null);
// Do not further propagate the new amrmToken for UAM
response.setAMRMToken(null);
...{code}
> HeartbeatCallBack int FederationInterceptor clear AMRMToken in response from
> UAM should before add to aysncResponseSink
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9610
> URL: https://issues.apache.org/jira/browse/YARN-9610
> Project: Hadoop YARN
> Issue Type: Bug
> Components: amrmproxy, federation
> Affects Versions: 3.1.2
> Environment: in federation, `allocate` is async. the response from RM
> is cached in `asyncResponseSink`.
> the final allocate response is merged from all RMs allocate response. merge
> will throw exception when AMRMToken from UAM response is not null.
> But set AMRMToken from UAM response to null is not in the scope of lock. so
> there will be a change merge see that AMRMToken from UAM response is not
> null.
> so we should clear the token before add response to asyncResponseSink
>
>
> {code:java}
> synchronized (asyncResponseSink) {
> List<AllocateResponse> responses = null;
> if (asyncResponseSink.containsKey(subClusterId)) {
> responses = asyncResponseSink.get(subClusterId);
> } else {
> responses = new ArrayList<>();
> asyncResponseSink.put(subClusterId, responses);
> }
> responses.add(response);
> // Notify main thread about the response arrival
> asyncResponseSink.notifyAll();
> }
> ...
> if (this.isUAM && response.getAMRMToken() != null) {
> Token<AMRMTokenIdentifier> newToken = ConverterUtils
> .convertFromYarn(response.getAMRMToken(), (Text) null);
> // Do not further propagate the new amrmToken for UAM
> response.setAMRMToken(null);
> ...{code}
> Reporter: Morty Zhong
> Priority: Major
>
> in federation, `allocate` is async. the response from RM is cached in
> `asyncResponseSink`.
> the final allocate response is merged from all RMs allocate response. merge
> will throw exception when AMRMToken from UAM response is not null.
> But set AMRMToken from UAM response to null is not in the scope of lock. so
> there will be a change merge see that AMRMToken from UAM response is not
> null.
> so we should clear the token before add response to asyncResponseSink
>
>
> {code:java}
> synchronized (asyncResponseSink) {
> List<AllocateResponse> responses = null;
> if (asyncResponseSink.containsKey(subClusterId)) {
> responses = asyncResponseSink.get(subClusterId);
> } else {
> responses = new ArrayList<>();
> asyncResponseSink.put(subClusterId, responses);
> }
> responses.add(response);
> // Notify main thread about the response arrival
> asyncResponseSink.notifyAll();
> }
> ...
> if (this.isUAM && response.getAMRMToken() != null) {
> Token<AMRMTokenIdentifier> newToken = ConverterUtils
> .convertFromYarn(response.getAMRMToken(), (Text) null);
> // Do not further propagate the new amrmToken for UAM
> response.setAMRMToken(null);
> ...{code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]