[ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898599#comment-16898599
 ] 

Steven Rand commented on YARN-4946:
-----------------------------------

I noticed after upgrading a cluster to 3.2.0 that RM recovery now takes about 
20 minutes, whereas before it took less than one minute.

I checked the RM's logs, and noticed that it hits the code path added in this 
patch more than 18 million times
{code:java}
# The log rotation settings allow for only 20 log files, so actually this 
number is lower than the real count.
$ grep 'but not removing' hadoop-palantir-resourcemanager-<hostname>.log* | wc 
-l
18092893
{code}
I checked in ZK, and according to {{./zkCli.sh ls 
/rmstore/ZKRMStateRoot/RMAppRoot}}, I have 9,755 apps in the RM state store, 
even though the configured max is 1,000.

I think that what happens when RM recovery starts is:
 * Some number of apps in the state store cause us to handle an 
{{APP_COMPLETED}} event during recovery. I'm not sure exactly how many – 
presumably just those that are finished?
 * Each time we handle one of these events, we call 
{{removeCompletedAppsFromStateStore}} and {{removeCompletedAppsFromMemory}}, 
and in both cases we realize that there are more apps both in ZK and in memory 
than is allowed (limit for both is 1,000).
 * So for each of these events, we go through the for loops in both 
{{removeCompletedAppsFromStateStore}} and {{removeCompletedAppsFromMemory}} 
that try to remove apps from ZK and from memory.
 * For whatever reason – probably a separate issue on this cluster – log 
aggregation isn't complete for any of these apps. So the for loops never manage 
to delete apps. And since the for loops are deterministic, they try to delete 
the same apps every time, but never make progress.

And I think the repetition of these for loops for each {{APP_COMPLETED}} event 
explains the 18 million number – if we can have at most 9,755 finished apps in 
the state store, and for each of those apps we trigger 2 for loops that can 
have at most 8,755 iterations, we very quickly wind up with a lot of iterations.

Because this change can lead to much longer RM recovery times in circumstances 
like this one, I think I prefer option {{a}} from the two listed above.

Or, I think it's also reasonable to modify the patch from YARN-9571 to have a 
hardcoded TTL.

> RM should not consider an application as COMPLETED when log aggregation is 
> not in a terminal state
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4946
>                 URL: https://issues.apache.org/jira/browse/YARN-4946
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Szilard Nemeth
>            Priority: Major
>             Fix For: 3.2.0
>
>         Attachments: YARN-4946.001.patch, YARN-4946.002.patch, 
> YARN-4946.003.patch, YARN-4946.004.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to