[ 
https://issues.apache.org/jira/browse/YARN-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883294#comment-13883294
 ] 

Jason Lowe commented on YARN-950:
---------------------------------

Ran into another case where a user filled a disk with a large stdout/stderr, 
and the NM took forever to recover the disk since it was trying to aggregate 
the huge file to HDFS.  Not only was this a waste of HDFS space and network 
bandwidth, but ops were unable to manually recover easily by removing the large 
logfile.  The NM process was holding the file open during log aggregation, so 
the disk space was not able to be freed until either the NM finished 
aggregating or the NM process exited.

Many users would prefer the ability to grab a configurable number of bytes at 
the head of a large log and a number of bytes at the end of the large log.  Of 
course the NM would need to inject some text into the log to indicate it was 
truncated, and bonus points if it includes the original log size and/or the 
amount that was truncated.

> Ability to limit or avoid aggregating logs beyond a certain size
> ----------------------------------------------------------------
>
>                 Key: YARN-950
>                 URL: https://issues.apache.org/jira/browse/YARN-950
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 0.23.9
>            Reporter: Jason Lowe
>
> It would be nice if ops could configure a cluster such that any container log 
> beyond a configured size would either only have a portion of the log 
> aggregated or not aggregated at all.  This would help speed up the recovery 
> path for cases where a container creates an enormous log and fills a disk, as 
> currently it tries to aggregate the entire, enormous log rather than only 
> aggregating a small portion or simply deleting it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to