zhihai xu commented on YARN-2893:

[~adhoot], thanks for the review. I added a test case for the  AMLauncher 
changes in the new patch YARN-2893.002.patch.
The root cause for this bug is at job Client which submitted a bad token in 
The changes for RMAppManager#submitApplication is to prevent this error 
earlier. So the user who submit the application knows the real cause of the 

bq. The changes for RMAppManager#submitApplication seems to no longer return 
RMAppRejectedEvent for any exception in 
getDelegationTokenRenewer().addApplicationAsync. Is that deliberate?
I checked the code for DelegationTokenRenewer#addApplicationAsync, I didn't 
find any exception which will be generated from addApplicationAsync.
addApplicationAsync will launch a thread to run handleDTRenewerAppSubmitEvent, 
any exception from handleDTRenewerAppSubmitEvent will return RMAppRejectedEvent.
    private void handleDTRenewerAppSubmitEvent(
        DelegationTokenRenewerAppSubmitEvent event) {
      try {
        // Setup tokens for renewal
            .handle(new RMAppEvent(event.getApplicationId(), 
      } catch (Throwable t) {
            "Unable to add the application to the delegation token renewer.",
        // Sending APP_REJECTED is fine, since we assume that the
        // RMApp is in NEW state and thus we havne't yet informed the
        // Scheduler about the existence of the application
            new RMAppRejectedEvent(event.getApplicationId(), t.getMessage()));
This is why I only check the exception for parseCredentials.
Also the original code only expected the exception from parseCredentials based 
on the exception message.
LOG.warn("Unable to parse credentials.", e);

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>                 Key: YARN-2893
>                 URL: https://issues.apache.org/jira/browse/YARN-2893
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: zhihai xu
>         Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.

This message was sent by Atlassian JIRA

Reply via email to