[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903297#comment-13903297
 ] 

Jason Lowe commented on YARN-221:
---------------------------------

Personally I think the AM racing to kill tasks that have indicated they are 
done is a bug.  It causes all sorts of problems:

- Occasional "Container killed by ApplicationMaster" messages on otherwise 
normal tasks confuses users into thinking something went wrong for some of 
their tasks
- Trying to take a java profile for a task can fail if the profile dump takes 
too long or the kill arrives too quickly (see MAPREDUCE-5465)
- Killing a task that should otherwise be exiting on its own creates a constant 
race-condition scenario that has caused problems in other similar setups (see 
MAPREDUCE-4157 for a similar situation where the RM was killing AMs too early 
and causing problems).

I think we should fix these races by implementing a reasonable delay between a 
task reporting a terminal state and a kill being issued by the AM.  That allows 
the task to complete on its own with an appropriate exit code, eliminating the 
need to specify log states on stop as a workaround.

> NM should provide a way for AM to tell it not to aggregate logs.
> ----------------------------------------------------------------
>
>                 Key: YARN-221
>                 URL: https://issues.apache.org/jira/browse/YARN-221
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Robert Joseph Evans
>            Assignee: Chris Trezzo
>         Attachments: YARN-221-trunk-v1.patch
>
>
> The NodeManager should provide a way for an AM to tell it that either the 
> logs should not be aggregated, that they should be aggregated with a high 
> priority, or that they should be aggregated but with a lower priority.  The 
> AM should be able to do this in the ContainerLaunch context to provide a 
> default value, but should also be able to update the value when the container 
> is released.
> This would allow for the NM to not aggregate logs in some cases, and avoid 
> connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to