[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

Jason Lowe (JIRA) Mon, 24 Feb 2014 07:44:35 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910411#comment-13910411
 ]


Jason Lowe commented on YARN-221:
---------------------------------

bq. We can have RM AM wait for notification as in container exit -> NM notifies 
RM -> RM notifies AM. That will create some delay for AM to declare the job is 
done. With the NM -> RM heartbeat value used in big clusters, it could add 
couple seconds delay for the job. That might not be a big deal for regular MR 
jobs.

The NM does out-of-band heartbeats when containers exit, so the turnaround time 
can be shorter than a full NM heartbeat interval. 

If we're really concerned about any additional time added for graceful task 
exit we can also have the AM unregister when the job succeeds/fails but before 
all tasks exit, and eventually the RM will kill all containers of the 
application when the AM eventually exits (or times out waiting).  In that sense 
it would not add any time from the job client's perspective, as the job could 
report completion at the same time it did before.  However it would add some 
time from the YARN perspective, as the application is lingering on the cluster 
a few extra seconds in the FINISHING state than it did before.

bq. One thing to add we need the definition and policy on how to handle those 
tasks that are in the finishing state and MR AM ends up stopping them as they 
don't exit by themselves.

I don't think we need to get too tricky here.  The NM will see the container 
return a non-zero exit code and assume that's failure.  If tasks are succeeding 
but returning non-zero exit codes then that's probably a bug and arguably a 
good thing we're grabbing the logs to show what went wrong when it tried to 
tear down.  IMHO we should fix what's causing the non-zero exit code rather 
than try to add a mechanism to prevent logs from being aggregated in what 
should be a rare and abnormal case.

> NM should provide a way for AM to tell it not to aggregate logs.
> ----------------------------------------------------------------
>
>                 Key: YARN-221
>                 URL: https://issues.apache.org/jira/browse/YARN-221
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Robert Joseph Evans
>            Assignee: Chris Trezzo
>         Attachments: YARN-221-trunk-v1.patch
>
>
> The NodeManager should provide a way for an AM to tell it that either the 
> logs should not be aggregated, that they should be aggregated with a high 
> priority, or that they should be aggregated but with a lower priority.  The 
> AM should be able to do this in the ContainerLaunch context to provide a 
> default value, but should also be able to update the value when the container 
> is released.
> This would allow for the NM to not aggregate logs in some cases, and avoid 
> connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

Reply via email to