[
https://issues.apache.org/jira/browse/YARN-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated YARN-5837:
--------------------------------
Attachment: YARN-5837.001.patch
YARN-5837.branch-2.7.001.patch
The patch fixes the problem by passing in a {{Resources}} object with 0 memory
and 0 vcores. It also sets the version to "unknown" instead of "null" so it
shows up nicer. It also updates a test and I've verified it in a cluster.
The trunk patch applies cleanly to trunk, branch-2, and branch-2.8 (with some
fuzzing by the {{patch}} command). The branch-2.7 patch applies to branch-2.7.
> NPE when getting node status of a decommissioned node after an RM restart
> -------------------------------------------------------------------------
>
> Key: YARN-5837
> URL: https://issues.apache.org/jira/browse/YARN-5837
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.3, 3.0.0-alpha1
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: YARN-5837.001.patch, YARN-5837.branch-2.7.001.patch
>
>
> If you decommission a node, the {{yarn node}} command shows it like this:
> {noformat}
> >> bin/yarn node -list -all
> 2016-11-04 08:54:37,169 INFO client.RMProxy: Connecting to ResourceManager at
> 0.0.0.0/0.0.0.0:8032
> Total Nodes:1
> Node-Id Node-State Node-Http-Address
> Number-of-Running-Containers
> 192.168.1.69:57560 DECOMMISSIONED 192.168.1.69:8042
> 0
> {noformat}
> And a full report like this:
> {noformat}
> >> bin/yarn node -status 192.168.1.69:57560
> 2016-11-04 08:55:08,928 INFO client.RMProxy: Connecting to ResourceManager at
> 0.0.0.0/0.0.0.0:8032
> Node Report :
> Node-Id : 192.168.1.69:57560
> Rack : /default-rack
> Node-State : DECOMMISSIONED
> Node-Http-Address : 192.168.1.69:8042
> Last-Health-Update : Fri 04/Nov/16 08:53:58:802PDT
> Health-Report :
> Containers : 0
> Memory-Used : 0MB
> Memory-Capacity : 8192MB
> CPU-Used : 0 vcores
> CPU-Capacity : 8 vcores
> Node-Labels :
> Resource Utilization by Node :
> Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
> {noformat}
> If you then restart the ResourceManager, you get this report:
> {noformat}
> >> bin/yarn node -list -all
> 2016-11-04 08:57:18,512 INFO client.RMProxy: Connecting to ResourceManager at
> 0.0.0.0/0.0.0.0:8032
> Total Nodes:4
> Node-Id Node-State Node-Http-Address
> Number-of-Running-Containers
> 192.168.1.69:-1 DECOMMISSIONED 192.168.1.69:-1
> 0
> {noformat}
> And when you try to get the full report on the now "-1" node, you get an NPE:
> {noformat}
> >> bin/yarn node -status 192.168.1.69:-1
> 2016-11-04 08:57:57,385 INFO client.RMProxy: Connecting to ResourceManager at
> 0.0.0.0/0.0.0.0:8032
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.client.cli.NodeCLI.printNodeStatus(NodeCLI.java:296)
> at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:116)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:63)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]