YCozy commented on YARN-10166:

We encountered the same issue. An AM is killed during NM failover, but the AM 
still manages to send the allocate() heartbeat to RM after the AM is 
unregistered and before the AM is totally gone. As a result, the confusing 
ERROR entry "Application attempt ... doesn't exist" occurs in RM's log. Logging 
more information about the app would be a great way to clear the confusion.


Btw, why do we want this to be an ERROR for the RM?

> Add detail log for ApplicationAttemptNotFoundException
> ------------------------------------------------------
>                 Key: YARN-10166
>                 URL: https://issues.apache.org/jira/browse/YARN-10166
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Youquan Lin
>            Priority: Minor
>              Labels: patch
>         Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_000001 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_000001 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> {code}

This message was sent by Atlassian Jira

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to