[
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900727#comment-16900727
]
Zac Zhou commented on YARN-9721:
--------------------------------
[~sunilg]
Thanks a lot for your comments~
Maybe I could use some methods to clean up the inactive list.
# Add a parameter like "--prune-nodes", to the command "rmadmin
-refreshNodes". A parameter named like "prunable" can be added to RMNodes. when
"rmadmin -refreshNodes --prune-nodes" is executed. prunable of RMNodes should
be true, and RMNodes will deleted by removalTimer.
# Add a time period parameter in yarn configuration. If RMNodes stays in the
inactive list more than that time period, delete the RMNodes.
# Add a parameter in yarn configuration. If the parameter is true. Delete the
RMNodes from the inactive list directly.
[~sunilg], [~leftnoteasy], [~cheersyang], [~tangzhankun] Any Ideas~
> An easy method to exclude a nodemanager from the yarn cluster cleanly
> ---------------------------------------------------------------------
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zac Zhou
> Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
> and "rmadmin -refreshNodes" command are used to decommission the server.
> But this method cannot clean up the node clearly. Nodemanager servers are
> still in Decommissioned Nodes as the attachment shows.
> !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
> But the logic of isUntrackedNode method is to restrict. If include-path is
> not used, no servers can meet the criteria. Using an include file would make
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and
> deleted frequently. We need a way to exclude a nodemanager from the yarn
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would
> keep growing, which would cause a memory issue of RM.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]