[ 
https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-337:
----------------------------

    Attachment: YARN-337.patch

Patch that sets the tracking URL to the RM app page when an AM attempt is 
killed.  Also refactored the places where this was done for FAILED attempts to 
better cover all the various ways an AM attempt can fail.

As for the unregister attempt failure, I'm tempted to leave that as-is since 
there will always be races between YARN-level kill/fail and apps unregistering. 
 As long as we point to the RM app page when something goes wrong, at least the 
user has something to start with to diagnose the problem rather than a bad link 
to nowhere.
                
> RM handles killed application tracking URL poorly
> -------------------------------------------------
>
>                 Key: YARN-337
>                 URL: https://issues.apache.org/jira/browse/YARN-337
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>              Labels: usability
>         Attachments: YARN-337.patch
>
>
> When the ResourceManager kills an application, it leaves the proxy URL 
> redirecting to the original tracking URL for the application even though the 
> ApplicationMaster is no longer there to service it.  It should redirect it 
> somewhere more useful, like the RM's web page for the application, where the 
> user can find that the application was killed and links to the AM logs.
> In addition, sometimes the AM during teardown from the kill can attempt to 
> unregister and provide an updated tracking URL, but unfortunately the RM has 
> "forgotten" the AM due to the kill and refuses to process the unregistration. 
>  Instead it logs:
> {noformat}
> 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_000001
> {noformat}
> It should go ahead and process the unregistration to update the tracking URL 
> since the application offered it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to