[
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589785#comment-13589785
]
Sandy Ryza commented on YARN-417:
---------------------------------
bq. I think if ContainerExitCodes needs to be added then it should be its own
jira
Will move the container exit codes in a separate JIRA.
bq. The helper function would have helped because containers contain
information set by 2 entities...
The issue is that there is not a ton of information for a helper function to
interpret. From what I can tell, The framework only defines two special exit
codes, and does not distinguish between OOMs and other kinds of container
failures, or between killing a container because it was preempted or because
the RM lost track of it. These exit codes are platform independent, and any
other exit codes can be both application and platform dependent, so the
AMRMClientAsync wouldn't know how to interpret them. As ContainerStatuses
coming from the RM are only in the context of container completions,
ContainerState provides no extra information. Additional information can
sometimes be found in the diagnostics strings, but if the reasons that
containers die are to be codified, I don't think it should be done by
interpreting strings at the API level.
bq. Why is client.start() being called in init? client.stop() is being called
in stop().
registerApplicationMaster needs to be called after setting up the RM proxy,
which occurs in AMRMClient#start, but before starting the heartbeater, which
occurs in AMRMClientAsync#start. Another way to accomplish this would be to
move the code in AMRMClientImpl#start to AMRMClientImpl#init, which also seems
reasonable to me. A third way would be to call registerApplicationMaster from
AMRMClientAsync#start.
bq. I am wary of calling back on the heartbeat thread itself.
Will add a handling thread.
bq. Not waiting for the thread to join()? Why interrupt()? Thread needs to be
stopped first so that it stops calling into the client. or else it can call
into a client that has already stopped.
Good point. My reason was that I've seen this as convention other places in
YARN (see NodeStatusUpdaterImpl, for example), and that it would allow stop to
be called from onContainerCompleted without deadlock, but with the handling
thread, the latter shouldn't be a problem, so I'll change it.
> Add a poller that allows the AM to receive notifications when it is assigned
> containers
> ---------------------------------------------------------------------------------------
>
> Key: YARN-417
> URL: https://issues.apache.org/jira/browse/YARN-417
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: api, applications
> Affects Versions: 2.0.3-alpha
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java,
> YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java,
> YarnAppMasterListener.java
>
>
> Writing AMs would be easier for some if they did not have to handle
> heartbeating to the RM on their own.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira