[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.4.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, YARN-1071.4.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.5.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, YARN-1071.4.patch, YARN-1071.5.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.6.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, YARN-1071.4.patch, YARN-1071.5.patch, YARN-1071.6.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.2.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.3.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.1.patch Upload a patch to set the decommissioned node metrics when the excluded list is read. ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srimanth Gunturi updated YARN-1071: --- Priority: Critical (was: Major) ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Priority: Critical I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1071: -- Priority: Major (was: Critical) bq. However I think atleast the decommissioned count should be set based on the exclude file information. YARN already knows about the excluded hosts, as it knows to ignore their communication. That seems reasonable. Decreasing priority though. ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira