[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036203#comment-15036203 ] Daniel Templeton commented on YARN-4406: Now that I've had a chance to look at the web UI code, I see that my theory was close, but not quite. The number of decommissioned nodes is taken from {{ClusterMetrics.getMetrics().getDecomissionedNMs()}}, which is just the count of nodes in the excludes list. The list of decommissioned nodes comes from {{ResourceManager.getRMContext().getInactiveRMNodes()}}, which contains only nodes that have been decommissioned since the last restart. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036336#comment-15036336 ] Kuhu Shukla commented on YARN-4406: --- Yes that is right, the issue is present on trunk. We could during {{serviceInit}} populate this metric to the number of decommissioned nodes in the inactive list, since we don't care about nodes that were decommissioned before last restart AFAIK. At present: {code} private void setDecomissionedNMsMetrics() { Set excludeList = hostsReader.getExcludedHosts(); ClusterMetrics.getMetrics().setDecommisionedNMs(excludeList.size()); } {code} To: {code} private void setDecomissionedNMsMetrics() { int numDecommissioned = 0; for(RMNode rmNode : rmContext.getInactiveRMNodes().values()) { if (rmNode.getState() == NodeState.DECOMMISSIONED) { numDecommissioned++; } } ClusterMetrics.getMetrics().setDecommisionedNMs(numDecommissioned); } {code} > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036186#comment-15036186 ] Kuhu Shukla commented on YARN-4406: --- Thank you [~Naganarasimha]. Asking [~rchiang] if its alright for me to work on it. I am currently working in that code base for YARN-4311. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036214#comment-15036214 ] Ray Chiang commented on YARN-4406: -- Thanks [~Naganarasimha]. I'll close up this JIRA as a duplicate. As for fixing it, I'll leave that up to you and [~templedf]. It looks like you two are further ahead than I am. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036241#comment-15036241 ] Sunil G commented on YARN-4406: --- YARN-3226 which is a subtask of YARN-914 will be splitting cluster metrics in to two TABLES (Node metrics table) as we have to show Decommissioning nodes too. Patch is given there already for same. However this particular case s not handled there. Mostly as progress is made for this, please also see the progress in YARN-3226. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036410#comment-15036410 ] Daniel Templeton commented on YARN-4406: That's the simplest resolution, but I was actually leaning the other direction: making the list of decommissioned nodes include the full excludes list. I guess it comes down to how we define decommissioned in the UI. I interpret the excludes list as the canonical list of decommissioned nodes. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Assignee: Kuhu Shukla >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036389#comment-15036389 ] Ray Chiang commented on YARN-4406: -- That looks good to me. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036616#comment-15036616 ] Kuhu Shukla commented on YARN-4406: --- I agree. I was thinking about that too. During {{registerwithRM()}} we throw a YarnException while on the ResourceTrackerService side we just send NodeAction as SHUTDOWN. We could in fact update InactiveRMNode list with this node, so that it is consistent. Let me know what you think. I will put up a patch soon. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Assignee: Kuhu Shukla >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034604#comment-15034604 ] Daniel Templeton commented on YARN-4406: Looking at the {{NodesListManager}}, it looks to me like {{ClusterMetrics.getMetrics().getDecomissionedNMs()}} is set to the size of the excludes list, but {{getUnusableNodes()}} returns the list of nodes that have been decommissioned since the last reboot. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034480#comment-15034480 ] Kuhu Shukla commented on YARN-4406: --- [~rchiang] thank you for reporting this. I think the root cause is: In {{updateMetricsForDeactivatedNode}}, the re-addition of a node does not decrement the Decommissioned node count as expected. It looks at previous state and there is no switch case for decommissioned nodes. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034490#comment-15034490 ] Ray Chiang commented on YARN-4406: -- Thanks for letting me know. I'll take a look at that. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034647#comment-15034647 ] Ray Chiang commented on YARN-4406: -- So, would the right solution be that getUnuableNodes() should be "excludes list" aware? > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034482#comment-15034482 ] Kuhu Shukla commented on YARN-4406: --- Spoke too soon. on trunk {{updateMetricsForRejoinedNode}} should handle that. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035223#comment-15035223 ] Naganarasimha G R commented on YARN-4406: - Hi [~rchiang] & [~kshukla] YARN-3102 also is for the same issue, earlier had stopped working on this because i was skeptical of YARN-914 (or its subjira's ) might have impact or take care of this issue. If you guys have already narrowed down on the cause feel free to assign YARN-3102 and close this issue. > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)