[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-07 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902027#comment-16902027
 ] 

Zac Zhou commented on YARN-9721:


Thanks, [~tangzhankun],

I prefer method 1 and 2. 

Method 2 can work the same as method 3, when the time period parameter is set 
to 0.

Method 1 needs to add a member variable in RefreshNodesRequest and RMNode, 
which would involve a bit more work. 

I'm ok with both methods~

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901656#comment-16901656
 ] 

Zhankun Tang commented on YARN-9721:


[~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid 
environment.

I'm checking this story to get a more clear understanding. BTW, which solution 
do you prefer?

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900727#comment-16900727
 ] 

Zac Zhou commented on YARN-9721:


 

[~sunilg]

Thanks a lot for your comments~

Maybe I could use some methods to clean up the inactive list.
 # Add a parameter like "--prune-nodes", to the command "rmadmin 
-refreshNodes". A parameter named like "prunable" can be added to RMNodes. when 
"rmadmin -refreshNodes --prune-nodes" is executed.  prunable of RMNodes should 
be true, and RMNodes will deleted by removalTimer.
 # Add a time period parameter in yarn configuration. If RMNodes stays in the 
inactive list more than that time period, delete the RMNodes.
 # Add a parameter in yarn configuration. If the parameter is true. Delete the 
RMNodes from the inactive list directly.

[~sunilg], [~leftnoteasy], [~cheersyang], [~tangzhankun] Any Ideas~

 

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900661#comment-16900661
 ] 

Sunil Govindan commented on YARN-9721:
--

Looping [~tangzhankun] to this thread.

[~yuan_zac], ideally node decommissioning will help you to make sure all 
containers are drained and a smooth decommission can be done. Once the node is 
decommissioned, you can remove as per use case. And as you mentioned, such 
nodes which are forced out should not be in inactive list.

cc [~leftnoteasy] [~cheersyang]

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org