[
https://issues.apache.org/jira/browse/YARN-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888758#comment-16888758
]
Tao Yang commented on YARN-9686:
--------------------------------
Attached v1 patch for review.
> Reduce visibility of blacklisted nodes information (only for current app
> attempt) to avoid the abuse of memory
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9686
> URL: https://issues.apache.org/jira/browse/YARN-9686
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: YARN-9686.001.patch
>
>
> Recently we found an issue that RM did a long GC and found many WARN
> logs(Ignoring Blacklists, blacklist size 1775 is more than failure threshold
> ratio 0.20000000298023224 out of total usable nodes 1778) in RM log with a
> super high frequency about 3w+/s.
> The direct cause is that a few apps with a large attempts and many
> blacklisted nodes were requested frequently via REST API or WEB UI. For every
> single request, RM should allocate new memory for blacklisted nodes for many
> times(N * NUM_ATTETMPTS).
> Currently both AM(system) blacklisted nodes and app blacklisted nodes are
> transferred among app attempts and there are only one instance for each
> other, it's redundant and costly to travel all blacklisted nodes for every
> app attempt, so that I propose to get and show blacklisted nodes only for
> current app attempt to enhance performance and avoid the abuse of memory in
> some similar scenarios.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]