[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702933#comment-13702933
 ] 

Xuan Gong commented on YARN-875:
--------------------------------

For the callback, we can catch Throwable, and call handler.onError(Exception). 
This will tell ApplicationMaster to jump out of loop, and go into finish 
function. And eventually, AMRMClientAsync will call unregisterApplicationMaster 
and set keepRunning flag to false which will stop the heartBeat thread.

But we can let HeartBeat thread stop a little bit earlier.
Option one : inside the catch block, we can call heartBeatThread.interrupt() 
and set keepRunning = false
Option two : we define a volatile Exception savedCallBackException, inside the 
catch block, we can set savedCallBackException, and inside 
heartBeatThread.run(), before we do the allocate(), we alway check whether 
savedCallBackException is null.

[~bikassaha] anu other suggestions ?
                
> Application can hang if AMRMClientAsync callback thread has exception
> ---------------------------------------------------------------------
>
>                 Key: YARN-875
>                 URL: https://issues.apache.org/jira/browse/YARN-875
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>
> Currently that thread will die and then never callback. App can hang. 
> Possible solution could be to catch Throwable in the callback and then call 
> client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to