[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902027#comment-16902027 ] Zac Zhou commented on YARN-9721: Thanks, [~tangzhankun], I prefer method 1 and 2. Method 2 can work the same as method 3, when the time period parameter is set to 0. Method 1 needs to add a member variable in RefreshNodesRequest and RMNode, which would involve a bit more work. I'm ok with both methods~ > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901656#comment-16901656 ] Zhankun Tang commented on YARN-9721: [~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid environment. I'm checking this story to get a more clear understanding. BTW, which solution do you prefer? > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900727#comment-16900727 ] Zac Zhou commented on YARN-9721: [~sunilg] Thanks a lot for your comments~ Maybe I could use some methods to clean up the inactive list. # Add a parameter like "--prune-nodes", to the command "rmadmin -refreshNodes". A parameter named like "prunable" can be added to RMNodes. when "rmadmin -refreshNodes --prune-nodes" is executed. prunable of RMNodes should be true, and RMNodes will deleted by removalTimer. # Add a time period parameter in yarn configuration. If RMNodes stays in the inactive list more than that time period, delete the RMNodes. # Add a parameter in yarn configuration. If the parameter is true. Delete the RMNodes from the inactive list directly. [~sunilg], [~leftnoteasy], [~cheersyang], [~tangzhankun] Any Ideas~ > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900661#comment-16900661 ] Sunil Govindan commented on YARN-9721: -- Looping [~tangzhankun] to this thread. [~yuan_zac], ideally node decommissioning will help you to make sure all containers are drained and a smooth decommission can be done. Once the node is decommissioned, you can remove as per use case. And as you mentioned, such nodes which are forced out should not be in inactive list. cc [~leftnoteasy] [~cheersyang] > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org