[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036203#comment-15036203
 ] 

Daniel Templeton commented on YARN-4406:


Now that I've had a chance to look at the web UI code, I see that my theory was 
close, but not quite.  The number of decommissioned nodes is taken from 
{{ClusterMetrics.getMetrics().getDecomissionedNMs()}}, which is just the count 
of nodes in the excludes list.  The list of decommissioned nodes comes from 
{{ResourceManager.getRMContext().getInactiveRMNodes()}}, which contains only 
nodes that have been decommissioned since the last restart.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036336#comment-15036336
 ] 

Kuhu Shukla commented on YARN-4406:
---

Yes that is right, the issue is present on trunk. We could during 
{{serviceInit}} populate this metric to the number of decommissioned nodes in 
the inactive list, since we don't care about nodes that were decommissioned 
before last restart AFAIK. 

At present:
{code}
  private void setDecomissionedNMsMetrics() {
Set excludeList = hostsReader.getExcludedHosts();
ClusterMetrics.getMetrics().setDecommisionedNMs(excludeList.size());
  }
{code}

To:
{code}
  private void setDecomissionedNMsMetrics() {
int numDecommissioned = 0;
for(RMNode rmNode : rmContext.getInactiveRMNodes().values()) {
  if (rmNode.getState() == NodeState.DECOMMISSIONED) {
numDecommissioned++;
  }
}
ClusterMetrics.getMetrics().setDecommisionedNMs(numDecommissioned);
  }
{code}


> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036186#comment-15036186
 ] 

Kuhu Shukla commented on YARN-4406:
---

Thank you [~Naganarasimha]. Asking [~rchiang] if its alright for me to work on 
it. I am currently working in that code base for YARN-4311.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036214#comment-15036214
 ] 

Ray Chiang commented on YARN-4406:
--

Thanks [~Naganarasimha].  I'll close up this JIRA as a duplicate.

As for fixing it, I'll leave that up to you and [~templedf].  It looks like you 
two are further ahead than I am.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036241#comment-15036241
 ] 

Sunil G commented on YARN-4406:
---

YARN-3226 which is a subtask of YARN-914 will be splitting cluster metrics in 
to two TABLES (Node metrics table)  as we have to show Decommissioning nodes 
too. 
Patch is given there already for same. However this particular case s not 
handled there. Mostly as progress is made for this,  please also see the 
progress in YARN-3226.



> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036410#comment-15036410
 ] 

Daniel Templeton commented on YARN-4406:


That's the simplest resolution, but I was actually leaning the other direction: 
making the list of decommissioned nodes include the full excludes list.  I 
guess it comes down to how we define decommissioned in the UI.  I interpret the 
excludes list as the canonical list of decommissioned nodes.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Assignee: Kuhu Shukla
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036389#comment-15036389
 ] 

Ray Chiang commented on YARN-4406:
--

That looks good to me.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-02 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036616#comment-15036616
 ] 

Kuhu Shukla commented on YARN-4406:
---

I agree. I was thinking about that too. During {{registerwithRM()}} we throw a 
YarnException while on the ResourceTrackerService side we just send NodeAction 
as SHUTDOWN. We could in fact update InactiveRMNode list with this node, so 
that it is consistent. Let me know what you think. I will put up a patch soon.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Assignee: Kuhu Shukla
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034604#comment-15034604
 ] 

Daniel Templeton commented on YARN-4406:


Looking at the {{NodesListManager}}, it looks to me like 
{{ClusterMetrics.getMetrics().getDecomissionedNMs()}} is set to the size of the 
excludes list, but {{getUnusableNodes()}} returns the list of nodes that have 
been decommissioned since the last reboot.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034480#comment-15034480
 ] 

Kuhu Shukla commented on YARN-4406:
---

[~rchiang] thank you for reporting this. I think  the root cause is:

In {{updateMetricsForDeactivatedNode}}, the re-addition of a node does not 
decrement the Decommissioned node count as expected. It looks at previous state 
and there is no switch case for decommissioned nodes.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034490#comment-15034490
 ] 

Ray Chiang commented on YARN-4406:
--

Thanks for letting me know.   I'll take a look at that.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034647#comment-15034647
 ] 

Ray Chiang commented on YARN-4406:
--

So, would the right solution be that getUnuableNodes() should be "excludes 
list" aware?

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034482#comment-15034482
 ] 

Kuhu Shukla commented on YARN-4406:
---

Spoke too soon. on trunk {{updateMetricsForRejoinedNode}} should handle that.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart

2015-12-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035223#comment-15035223
 ] 

Naganarasimha G R commented on YARN-4406:
-

Hi [~rchiang] & [~kshukla]
YARN-3102 also is for the same issue, earlier had stopped working on this 
because i was skeptical of YARN-914 (or its subjira's ) might have impact or 
take care of this issue. If you guys have already narrowed down on the cause 
feel free to assign YARN-3102 and close this issue.

> RM Web UI continues to show decommissioned nodes even after RM restart
> --
>
> Key: YARN-4406
> URL: https://issues.apache.org/jira/browse/YARN-4406
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ray Chiang
>Priority: Minor
>
> If you start up a cluster, decommission a NodeManager, and restart the RM, 
> the decommissioned node list will still show a positive number (1 in the case 
> of 1 node) and if you click on the list, it will be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)