[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253086#comment-15253086
 ] 

Robert Kanter commented on YARN-4766:
-------------------------------------

Looks good overall.  Here's a few things:
# {{AggregatedLogFormat#getPendingLogFilesToUploadForThisContainer}} should be 
marked {{@VisibleForTesting}}
# It's typically easier to maintain multiple constructors when you have the one 
with the most arguments do the "real" work and the others just call others with 
default values in arguments.  Can you make {{ApplicationImpl}} do that?  
# In {{ApplicationImpl#buildAppProto}}, it's catching an {{IOException}} (which 
shouldn't occur) and simply logging it.  If this does somehow occur, then it's 
going to continue executing, which is probably bad.  Given that the only caller 
of this method is already doing it from a try-catch block, I think we'd be 
better off throwing the {{IOException}}.  Also, the caller should log the 
{{Exception}} so that we get a stack trace.
# There's an extra newline above {{ApplicationImpl#ApplicationImpl}}
# {{AppLogAggregatorImpl#uploadLogsForContainers}} creates a new array named 
{{paths}} that's never used.
# In {{TestAppLogAggregatorImpl}}, 
{{testAggregatorWithRetentionPolicyDisabled_shouldUploadAllFiles}} and 
{{testAggregatorWhenNoFileOlderThanRetentionPolicy_ShouldUploadAll}} are 
identical other than a config property or two.  Can we make a helper than has 
most of the code and pass it the config properties so we can combine the code 
here?
# There's an extra newline in the {{DeletionServiceDeleteTaskProto}} proto 
message

> NM should not aggregate logs older than the retention policy
> ------------------------------------------------------------
>
>                 Key: YARN-4766
>                 URL: https://issues.apache.org/jira/browse/YARN-4766
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation, nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to