[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909039#comment-13909039
 ] 

Ming Ma commented on YARN-221:
------------------------------

Jason, that is a good point. I wondered about the reason behind the design of 
MR AM trying to stopContainer while task containers exist by themselves. The 
jiras you mentioned provide good background info.

We can have RM AM wait for notification as in container exit -> NM notifies RM 
-> RM notifies AM. That will create some delay for AM to declare the job is 
done. With the NM -> RM heartbeat value used in big clusters, it could add 
couple seconds delay for the job. That might not be a big deal for regular MR 
jobs.

Another thing is maybe MR AM don't need to call stopContainer on completed 
containers notified by RM.

We still have a scenario where we want to sample X% of successful tasks. We 
can't specify it up front during ContainLaunchContext given we don't know the 
status of tasks at that point. Somehow AM needs to adjust the log aggregation 
policy at runtime based on the number of successful tasks so far. For that, we 
need something like updateContainer.


> NM should provide a way for AM to tell it not to aggregate logs.
> ----------------------------------------------------------------
>
>                 Key: YARN-221
>                 URL: https://issues.apache.org/jira/browse/YARN-221
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Robert Joseph Evans
>            Assignee: Chris Trezzo
>         Attachments: YARN-221-trunk-v1.patch
>
>
> The NodeManager should provide a way for an AM to tell it that either the 
> logs should not be aggregated, that they should be aggregated with a high 
> priority, or that they should be aggregated but with a lower priority.  The 
> AM should be able to do this in the ContainerLaunch context to provide a 
> default value, but should also be able to update the value when the container 
> is released.
> This would allow for the NM to not aggregate logs in some cases, and avoid 
> connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to