[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.4.patch

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, 
 YARN-1071.4.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.5.patch

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, 
 YARN-1071.4.patch, YARN-1071.5.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.6.patch

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, 
 YARN-1071.4.patch, YARN-1071.5.patch, YARN-1071.6.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.2.patch

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch, YARN-1071.2.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.3.patch

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2014-02-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1071:
--

Attachment: YARN-1071.1.patch

Upload a patch to set the decommissioned node metrics when the excluded list is 
read. 


 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Jian He
 Attachments: YARN-1071.1.patch


 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2013-08-16 Thread Srimanth Gunturi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srimanth Gunturi updated YARN-1071:
---

Priority: Critical  (was: Major)

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Priority: Critical

 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

2013-08-16 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1071:
--

Priority: Major  (was: Critical)

bq. However I think atleast the decommissioned count should be set based on the 
exclude file information. YARN already knows about the excluded hosts, as it 
knows to ignore their communication.
That seems reasonable.

Decreasing priority though.

 ResourceManager's decommissioned and lost node count is 0 after restart
 ---

 Key: YARN-1071
 URL: https://issues.apache.org/jira/browse/YARN-1071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi

 I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
 {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
 -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 1,
 NumLostNMs : 2,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 After restarting RM, the counts were shown as below in JMX.
 {noformat}
 NumActiveNMs : 3,
 NumDecommissionedNMs : 0,
 NumLostNMs : 0,
 NumUnhealthyNMs : 0,
 NumRebootedNMs : 0
 {noformat}
 Notice that the lost and decommissioned NM counts are both 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira