[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

Bikas Saha (JIRA) Wed, 27 Feb 2013 23:33:35 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589296#comment-13589296
 ]


Bikas Saha commented on YARN-417:
---------------------------------

I think if ContainerExitCodes needs to be added then it should be its own jira 
because its an addition to the YARN API and should be kept distinct from this 
jira. This jira could be marked dependent on that jira. Its also missing out of 
memory, preemption from what I see in the patch.

ContainerRequest is something thats tightly coupled with the AMRMClient and 
hence I had put it inside AMRMClient. Its expected to be used in other places 
and thats why its public.

The helper function would have helped because containers contain information 
set by 2 entities - RM & NM. And its "status" is a combination of 
containerState and containerExitCode. e.g. state could be running in which case 
exit codes dont matter. The state could be completed in which case the exit 
code can tell us where it was killed or not. The exit code may not be enough 
because the RM could preempt a container before its launched and hence may not 
have a real exit code. Exit codes are not portable across platforms (eg. 
Windows and Linux). The helper function lets the library hide all this and 
present a single status value for the user to look at. Whether the container is 
allocated, running, completed_with_success, killed, preempted, out of memory 
etc. At some point this could move into YARN but as it evolves, the library 
might be a good place to house it. Does that help clarify its utility?

Why is client.start() being called in init? client.stop() is being called in 
stop().
{code}
+  @Override
+  public void init(Configuration conf) {
+    super.init(conf);
+    client.init(conf);
+    client.start();
+  }
{code}

Not waiting for the thread to join()? Why interrupt()? Thread needs to be 
stopped first so that it stops calling into the client. or else it can call 
into a client that has already stopped.
{code}
+  @Override
+  public void stop() {
+    client.stop();
+    keepRunning = false;
+    thread.interrupt();
+  }
{code}

I am wary of calling back on the heartbeat thread itself. If you notice the 
interface patch I had uploaded, I had  left some comments on moving this to its 
own thread. This is important because the callback code can be arbitrary and 
may not complete in time for our heartbeat, specially with 1000's of 
containers. We cannot let our heartbeat rate be dependent on app code 
performance.
                
> Add a poller that allows the AM to receive notifications when it is assigned 
> containers
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-417
>                 URL: https://issues.apache.org/jira/browse/YARN-417
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
> YARN-417-1.patch, YARN-417.patch, YarnAppMaster.java, 
> YarnAppMasterListener.java
>
>
> Writing AMs would be easier for some if they did not have to handle 
> heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

Reply via email to