[ https://issues.apache.org/jira/browse/YARN-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-9686: --------------------------- Attachment: YARN-9686.001.patch > Reduce visibility of blacklisted nodes information (only for current app > attempt) to avoid the abuse of memory > -------------------------------------------------------------------------------------------------------------- > > Key: YARN-9686 > URL: https://issues.apache.org/jira/browse/YARN-9686 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Major > Attachments: YARN-9686.001.patch > > > Recently we found an issue that RM did a long GC and found many WARN > logs(Ignoring Blacklists, blacklist size 1775 is more than failure threshold > ratio 0.20000000298023224 out of total usable nodes 1778) in RM log with a > super high frequency about 3w+/s. > The direct cause is that a few apps with a large attempts and many > blacklisted nodes were requested frequently via REST API or WEB UI. For every > single request, RM should allocate new memory for blacklisted nodes for many > times(N * NUM_ATTETMPTS). > Currently both AM(system) blacklisted nodes and app blacklisted nodes are > transferred among app attempts and there are only one instance for each > other, it's redundant and costly to travel all blacklisted nodes for every > app attempt, so that I propose to get and show blacklisted nodes only for > current app attempt to enhance performance and avoid the abuse of memory in > some similar scenarios. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org