[
https://issues.apache.org/jira/browse/YARN-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883294#comment-13883294
]
Jason Lowe commented on YARN-950:
---------------------------------
Ran into another case where a user filled a disk with a large stdout/stderr,
and the NM took forever to recover the disk since it was trying to aggregate
the huge file to HDFS. Not only was this a waste of HDFS space and network
bandwidth, but ops were unable to manually recover easily by removing the large
logfile. The NM process was holding the file open during log aggregation, so
the disk space was not able to be freed until either the NM finished
aggregating or the NM process exited.
Many users would prefer the ability to grab a configurable number of bytes at
the head of a large log and a number of bytes at the end of the large log. Of
course the NM would need to inject some text into the log to indicate it was
truncated, and bonus points if it includes the original log size and/or the
amount that was truncated.
> Ability to limit or avoid aggregating logs beyond a certain size
> ----------------------------------------------------------------
>
> Key: YARN-950
> URL: https://issues.apache.org/jira/browse/YARN-950
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 0.23.9
> Reporter: Jason Lowe
>
> It would be nice if ops could configure a cluster such that any container log
> beyond a configured size would either only have a portion of the log
> aggregated or not aggregated at all. This would help speed up the recovery
> path for cases where a container creates an enormous log and fills a disk, as
> currently it tries to aggregate the entire, enormous log rather than only
> aggregating a small portion or simply deleting it.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)