[
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589296#comment-13589296
]
Bikas Saha commented on YARN-417:
---------------------------------
I think if ContainerExitCodes needs to be added then it should be its own jira
because its an addition to the YARN API and should be kept distinct from this
jira. This jira could be marked dependent on that jira. Its also missing out of
memory, preemption from what I see in the patch.
ContainerRequest is something thats tightly coupled with the AMRMClient and
hence I had put it inside AMRMClient. Its expected to be used in other places
and thats why its public.
The helper function would have helped because containers contain information
set by 2 entities - RM & NM. And its "status" is a combination of
containerState and containerExitCode. e.g. state could be running in which case
exit codes dont matter. The state could be completed in which case the exit
code can tell us where it was killed or not. The exit code may not be enough
because the RM could preempt a container before its launched and hence may not
have a real exit code. Exit codes are not portable across platforms (eg.
Windows and Linux). The helper function lets the library hide all this and
present a single status value for the user to look at. Whether the container is
allocated, running, completed_with_success, killed, preempted, out of memory
etc. At some point this could move into YARN but as it evolves, the library
might be a good place to house it. Does that help clarify its utility?
Why is client.start() being called in init? client.stop() is being called in
stop().
{code}
+ @Override
+ public void init(Configuration conf) {
+ super.init(conf);
+ client.init(conf);
+ client.start();
+ }
{code}
Not waiting for the thread to join()? Why interrupt()? Thread needs to be
stopped first so that it stops calling into the client. or else it can call
into a client that has already stopped.
{code}
+ @Override
+ public void stop() {
+ client.stop();
+ keepRunning = false;
+ thread.interrupt();
+ }
{code}
I am wary of calling back on the heartbeat thread itself. If you notice the
interface patch I had uploaded, I had left some comments on moving this to its
own thread. This is important because the callback code can be arbitrary and
may not complete in time for our heartbeat, specially with 1000's of
containers. We cannot let our heartbeat rate be dependent on app code
performance.
> Add a poller that allows the AM to receive notifications when it is assigned
> containers
> ---------------------------------------------------------------------------------------
>
> Key: YARN-417
> URL: https://issues.apache.org/jira/browse/YARN-417
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: api, applications
> Affects Versions: 2.0.3-alpha
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java,
> YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java,
> YarnAppMasterListener.java
>
>
> Writing AMs would be easier for some if they did not have to handle
> heartbeating to the RM on their own.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira