[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833205#comment-16833205 ] qiuliang commented on YARN-9437: Hi,[~eepayne],[~cheersyang],Could you please check my earlier comment and share your thoughts. Thank you. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png, YARN-9437-v1.txt > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817515#comment-16817515 ] Hadoop QA commented on YARN-9437: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-9437 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9437 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23950/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9437) RMNodeImpls occupy too much memory and causes RM GC to take a long time
[ https://issues.apache.org/jira/browse/YARN-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811738#comment-16811738 ] qiuliang commented on YARN-9437: As I understand it, there are two cases that may cause the completedContainers in RMNodeImpl to not be released. 1. When RMAppAttemptImpl receives the CONTAINER_FINISHED(not amContainer) event, it will add this container to justFinishedContainers. When processing the AM heartbeat, RMAppAttemptImpl first sends the container in finishedContainersSentToAM to NM, and RMNodeImpl also removes these containers from the completedContainers. Then transfer the containers in justFinishedContainers to finishedContainersSentToAM and wait for the next AM heartbeat to send these containers to NM. If RMAppAttemptImpl accepts the event of AM unregistration, justFinishedContainers is not empty, then the container in justFinishedContainers may not have the opportunity to transfer to finishedContainersSentToAM, so that these containers are not sent to NM, and RMNodeImpl does not release these containers. 2. When RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, just add this container to justFinishedContainers and not send it to NM. For the first case, my idea is that when RMAppAttemptImpl handles the amContainer finished event, the container in justFinishedContainers is transferred to finishedContainersSentToAM and sent to NM along with amContainer. I am not sure if there is any other impact. For the second case, when RMAppAttemptImpl is in the final state and receives the CONTAINER_FINISHED event, these containers are sent directly to NM, but I am worried that this will generate many events. > RMNodeImpls occupy too much memory and causes RM GC to take a long time > --- > > Key: YARN-9437 > URL: https://issues.apache.org/jira/browse/YARN-9437 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.1 >Reporter: qiuliang >Priority: Minor > Attachments: 1.png, 2.png, 3.png > > > We use hadoop-2.9.1 in our production environment with 1600+ nodes. 95.63% of > RM memory is occupied by RMNodeImpl. Analysis of RM memory found that each > RMNodeImpl has approximately 14M. The reason is that there is a 130,000+ > completedcontainers in each RMNodeImpl that has not been released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org