[
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386700#comment-16386700
]
Wangda Tan commented on YARN-7952:
----------------------------------
Thanks [~xgong],
It might be better to update default value in a separate JIRA since it is not
directly related to this one.
1) nmLogAggregationStatusTracker should be part of NodeManager.java instead of
CM.
2) NMLogAggregationStatusTracker:
2.1 It looks like pullLock should be a readLock and updateLocker should be a
writeLock since pull doesn't change internal data, correct?
2.2 It's better to rename pullLocker/updateLocker to read/writeLock for
readability.
2.3 For overall workflow of this class, IMO it should be:
a. When log aggregation starts, application will be added to
{{NMLogAggregationStatusTracker}} with timestamp.
b. When RM acknowledges application finished, application will be removed from
the {{NMLogAggregationStatusTracker}}
c. If configured timeout reaches, application will be removed from the
{{NMLogAggregationStatusTracker}}.
I can see c. is handled by {{rollLogAggregationStatus}} and a. is handled by
{{updateLogAggregationStatus}}, but not sure about b. When I look at logic
below, I'm not sure if it works:
{code}
if (currentTime - updateTime > rollingInterval) {
LOG.warn("Ignore the log aggregation status update request "
+ "for the application:" + appId + ". The log aggregation
status"
+ " update time is " + updateTime + " while the request process
"
+ "time is " + currentTime + ".");
return;
}
{code}
Since the currentTime always almost equal to updateTime. The if statement
should be false, correct? And I think if application is null in NM side, we
should not add it to the track, correct?
Instead of this, should we explicitly notify {{NMLogAggregationStatusTracker}}
in {{ApplicationImpl.AppLogsAggregatedTransition}}? (It is true that
{{ApplicationImpl.AppLogsAggregatedTransition}} means RM acks the log is
finished?)
2.4 I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the
RM will timeout log aggregation status after configured timeout. What's the
purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue
potentially?
2.5 updateLogAggregationStatus:
- the {{long updateTime}} parameter is not necessary, can always use
System.currentTime instead.
3) {{LogAggregationTrakcer}}
- {{LogAggregationTrakcer}} => AppLogAggregationStatusForRMRecovery (it's not
tracker since it is a passive status, and it's better to emphasize purpose of
this class)
- {{getLastModifiedTime}} should be {{logAggregationStartedTime}} if you agree
with 2.4
4) Comment of YarnConfiguration.LOG_AGGREGATION_STATUS_TIME_OUT_MS should be
updated, it will be consumed by NM as well.
Will include more another detailed scan in the next review.
> Find a way to persist the log aggregation status
> ------------------------------------------------
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch
>
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In
> YARN-4946: RM should write out Aggregated Log Completion file flag next to
> logs, we have a discussion on how we can get the log aggregation status: make
> a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way
> to persist the log aggregation status first. This ticket is used to track the
> working progress for this purpose.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]