[
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370898#comment-16370898
]
Xuan Gong commented on YARN-7952:
---------------------------------
Right now, the NM would send its own log aggregation status to RM periodically
to RM. And RM would aggregate the status for each application, but it will not
generate the final status until a client call(from web ui or cli) trigger it.
But RM never persists the log aggregation status. So, when RM restarts/fails
over, the log aggregation status will become “NOT_STARTED”. This is confusing,
maybe we should change it to “NOT_AVAILABLE” (will create a separate ticket for
this). Anyway, we need to persist the log aggregation status for the future use.
Option one: the centralized approach.
Create a new service called LogAggregationTrackingService in RM which will
track the log aggregation status for all applications. We can also introduce
“EXPIRY_INTERVAL_MS”. The service can wake up periodically to check the log
aggregation progress. This log aggregationTrackingService will be similar to a
LivenessMonitor(such as AMLivenessMonitor). After EXPIRY_INTERVAL_MS, the
service would trigger an update RMStateStore event to persist the final log
aggregation status. So, we need to add one more RMStateStore event for every
application. Also, when RM restart/fail-over happens between the
EXPIRY_INTERVAL_MS, we still lose the log aggregation status.
Option two: only care about log aggregation status for the latest applications.
This approach will not persist the log aggregation status, so we will not need
to trigger a new RMStateStore event. When NM sends the log aggregation status
to RM, it will save a copy in its own memory(do we need to persist in NM state
store ???). We also introduce “EXPIRY_INTERVAL_MS”. When RM restarts/fails
over, NM would do re-register to RM. At this time, NM would send the previous
copy of the log aggregation status to RM based on the configured
“EXPIRY_INTERVAL_MS” (current_timestamp-last_updated_timestamp <=
EXPIRY_INTERVAL_MS). So, the RM could re-generate the log aggregation status.
Most of the changes will happen on NM side.
Option three: Option one + Option two
> Find a way to persist the log aggregation status
> ------------------------------------------------
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Priority: Major
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In
> YARN-4946: RM should write out Aggregated Log Completion file flag next to
> logs, we have a discussion on how we can get the log aggregation status: make
> a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way
> to persist the log aggregation status first. This ticket is used to track the
> working progress for this purpose.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]