[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

Sandy Ryza (JIRA) Thu, 28 Feb 2013 10:47:14 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589785#comment-13589785
 ]


Sandy Ryza commented on YARN-417:
---------------------------------

bq. I think if ContainerExitCodes needs to be added then it should be its own 
jira 
Will move the container exit codes in a separate JIRA.

bq. The helper function would have helped because containers contain 
information set by 2 entities...
The issue is that there is not a ton of information for a helper function to 
interpret.  From what I can tell, The framework only defines two special exit 
codes, and does not distinguish between OOMs and other kinds of container 
failures, or between killing a container because it was preempted or because 
the RM lost track of it.  These exit codes are platform independent, and any 
other exit codes can be both application and platform dependent, so the 
AMRMClientAsync wouldn't know how to interpret them.  As ContainerStatuses 
coming from the RM are only in the context of container completions, 
ContainerState provides no extra information. Additional information can 
sometimes be found in the diagnostics strings, but if the reasons that 
containers die are to be codified, I don't think it should be done by 
interpreting strings at the API level.

bq. Why is client.start() being called in init? client.stop() is being called 
in stop().
registerApplicationMaster needs to be called after setting up the RM proxy, 
which occurs in AMRMClient#start, but before starting the heartbeater, which 
occurs in AMRMClientAsync#start.  Another way to accomplish this would be to 
move the code in AMRMClientImpl#start to AMRMClientImpl#init, which also seems 
reasonable to me.  A third way would be to call registerApplicationMaster from 
AMRMClientAsync#start.

bq. I am wary of calling back on the heartbeat thread itself.
Will add a handling thread.

bq. Not waiting for the thread to join()? Why interrupt()? Thread needs to be 
stopped first so that it stops calling into the client. or else it can call 
into a client that has already stopped.
Good point. My reason was that I've seen this as convention other places in 
YARN (see NodeStatusUpdaterImpl, for example), and that it would allow stop to 
be called from onContainerCompleted without deadlock, but with the handling 
thread, the latter shouldn't be a problem, so I'll change it.

                
> Add a poller that allows the AM to receive notifications when it is assigned 
> containers
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-417
>                 URL: https://issues.apache.org/jira/browse/YARN-417
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
> YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, 
> YarnAppMasterListener.java
>
>
> Writing AMs would be easier for some if they did not have to handle 
> heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

Reply via email to