[
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383151#comment-14383151
]
zhihai xu commented on YARN-2893:
---------------------------------
[~adhoot], thanks for the review. I added a test case for the AMLauncher
changes in the new patch YARN-2893.002.patch.
The root cause for this bug is at job Client which submitted a bad token in
ApplicationSubmissionContext.
The changes for RMAppManager#submitApplication is to prevent this error
earlier. So the user who submit the application knows the real cause of the
issue.
bq. The changes for RMAppManager#submitApplication seems to no longer return
RMAppRejectedEvent for any exception in
getDelegationTokenRenewer().addApplicationAsync. Is that deliberate?
I checked the code for DelegationTokenRenewer#addApplicationAsync, I didn't
find any exception which will be generated from addApplicationAsync.
addApplicationAsync will launch a thread to run handleDTRenewerAppSubmitEvent,
any exception from handleDTRenewerAppSubmitEvent will return RMAppRejectedEvent.
{code}
private void handleDTRenewerAppSubmitEvent(
DelegationTokenRenewerAppSubmitEvent event) {
try {
// Setup tokens for renewal
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(event.getApplicationId(),
RMAppEventType.START));
} catch (Throwable t) {
LOG.warn(
"Unable to add the application to the delegation token renewer.",
t);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we havne't yet informed the
// Scheduler about the existence of the application
rmContext.getDispatcher().getEventHandler().handle(
new RMAppRejectedEvent(event.getApplicationId(), t.getMessage()));
}
}
{code}
This is why I only check the exception for parseCredentials.
Also the original code only expected the exception from parseCredentials based
on the exception message.
{code}
LOG.warn("Unable to parse credentials.", e);
{code}
> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Gera Shegalov
> Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch,
> YARN-2893.002.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt
> tokens in the AM launch context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)