[ https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386700#comment-16386700 ]
Wangda Tan commented on YARN-7952: ---------------------------------- Thanks [~xgong], It might be better to update default value in a separate JIRA since it is not directly related to this one. 1) nmLogAggregationStatusTracker should be part of NodeManager.java instead of CM. 2) NMLogAggregationStatusTracker: 2.1 It looks like pullLock should be a readLock and updateLocker should be a writeLock since pull doesn't change internal data, correct? 2.2 It's better to rename pullLocker/updateLocker to read/writeLock for readability. 2.3 For overall workflow of this class, IMO it should be: a. When log aggregation starts, application will be added to {{NMLogAggregationStatusTracker}} with timestamp. b. When RM acknowledges application finished, application will be removed from the {{NMLogAggregationStatusTracker}} c. If configured timeout reaches, application will be removed from the {{NMLogAggregationStatusTracker}}. I can see c. is handled by {{rollLogAggregationStatus}} and a. is handled by {{updateLogAggregationStatus}}, but not sure about b. When I look at logic below, I'm not sure if it works: {code} if (currentTime - updateTime > rollingInterval) { LOG.warn("Ignore the log aggregation status update request " + "for the application:" + appId + ". The log aggregation status" + " update time is " + updateTime + " while the request process " + "time is " + currentTime + "."); return; } {code} Since the currentTime always almost equal to updateTime. The if statement should be false, correct? And I think if application is null in NM side, we should not add it to the track, correct? Instead of this, should we explicitly notify {{NMLogAggregationStatusTracker}} in {{ApplicationImpl.AppLogsAggregatedTransition}}? (It is true that {{ApplicationImpl.AppLogsAggregatedTransition}} means RM acks the log is finished?) 2.4 I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the RM will timeout log aggregation status after configured timeout. What's the purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue potentially? 2.5 updateLogAggregationStatus: - the {{long updateTime}} parameter is not necessary, can always use System.currentTime instead. 3) {{LogAggregationTrakcer}} - {{LogAggregationTrakcer}} => AppLogAggregationStatusForRMRecovery (it's not tracker since it is a passive status, and it's better to emphasize purpose of this class) - {{getLastModifiedTime}} should be {{logAggregationStartedTime}} if you agree with 2.4 4) Comment of YarnConfiguration.LOG_AGGREGATION_STATUS_TIME_OUT_MS should be updated, it will be consumed by NM as well. Will include more another detailed scan in the next review. > Find a way to persist the log aggregation status > ------------------------------------------------ > > Key: YARN-7952 > URL: https://issues.apache.org/jira/browse/YARN-7952 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Xuan Gong > Assignee: Xuan Gong > Priority: Major > Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch > > > In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In > YARN-4946: RM should write out Aggregated Log Completion file flag next to > logs, we have a discussion on how we can get the log aggregation status: make > a client call to RM or get it directly from the Distributed file system(HDFS). > No matter which approach we would like to choose, we need to figure out a way > to persist the log aggregation status first. This ticket is used to track the > working progress for this purpose. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org