[
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902291#comment-13902291
]
Ming Ma commented on YARN-221:
------------------------------
[Chris
Trezzo|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ctrezzo] and
[Gera
Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov]
and I discussed more on this. We would like to give some updates and get
feedback from others. Similar to what Robert suggested originally, we need to
provide a way for AM to update the log aggregation policy when it stops the
container.
One likely log aggregation policy for MRAppMaster is to log all failed tasks
and sample logs of some successful tasks. What we found is container exitcode
isn't a reliable indication whether a MR task finishes successfully. That is
due to the fact MRAppMaster calls stopContainer while the YarnChild JVM exits
by itself. Depending on the timing, you might get non-zero exitcode for
successful tasks. So specifying the log aggregation policy up front during
ContainerLaunchContext isn't enough.
The mechanism for AM to pass log aggregation policy to YARN needs to address
different scenarios.
1. Containers exit by themselves. DistributedShell belongs to this category.
2. AM has to explicitly stop the containers. MR belongs to this category.
3. AM might want to inform NM to do on-demand log aggregation without stopping
the container. This might be useful for some long running applications.
To support #1, we have to specify the log aggregation policy as part of
startContainer call. Chris' patch handles that.
To support #2, AM has to indicate to NM whether the log aggregation is needed
during stopContainer call. AM can uses different types of policies such as
successful tasks sampling. For that, AM will specify the log aggregation policy
as part of StopContainerRequest.
{code:title=StopContainerRequest.java|borderStyle=solid}
...
/**
* Get the <code>ContainerLogAggregationPolicy</code> for the container.
*
* @return The <code>ContainerLogAggregationPolicy</code> for the container.
*/
@Public
@Stable
public ContainerLogAggregationPolicy getLogAggregationPolicy();
/**
* Set the <code>ContainerLogAggregationPolicy</code> for the container.
*
* @param policy The <code>ContainerLogAggregationPolicy</code> for the
container.
*/
@Public
@Stable
public void setLogAggregationPolicy(ContainerLogAggregationPolicy policy);
{code}
Alternatively we can define a new interface called ContainerStopContext to
capture log aggregation policy and other information we want to include later,
etc.
{code:title=StopContainerRequest.java|borderStyle=solid}
@Public
@Stable
public abstract ContainerStopContext getContainerStopContext();
@Public
@Stable
public abstract void setContainerStopContext(ContainerStopContext context);
{code}
To support #3, we need some new API such as updateContainer so that AM can ask
NM to roll container log and update the log aggregation policy, etc.
> NM should provide a way for AM to tell it not to aggregate logs.
> ----------------------------------------------------------------
>
> Key: YARN-221
> URL: https://issues.apache.org/jira/browse/YARN-221
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Robert Joseph Evans
> Assignee: Chris Trezzo
> Attachments: YARN-221-trunk-v1.patch
>
>
> The NodeManager should provide a way for an AM to tell it that either the
> logs should not be aggregated, that they should be aggregated with a high
> priority, or that they should be aggregated but with a lower priority. The
> AM should be able to do this in the ContainerLaunch context to provide a
> default value, but should also be able to update the value when the container
> is released.
> This would allow for the NM to not aggregate logs in some cases, and avoid
> connection to the NN at all.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)