[jira] [Updated] (YARN-5837) NPE when getting node status of a decommissioned node after an RM restart

Robert Kanter (JIRA) Fri, 04 Nov 2016 11:12:33 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Kanter updated YARN-5837:
--------------------------------
    Attachment: YARN-5837.001.patch
                YARN-5837.branch-2.7.001.patch

The patch fixes the problem by passing in a {{Resources}} object with 0 memory 
and 0 vcores.  It also sets the version to "unknown" instead of "null" so it 
shows up nicer.  It also updates a test and I've verified it in a cluster.

The trunk patch applies cleanly to trunk, branch-2, and branch-2.8 (with some 
fuzzing by the {{patch}} command).  The branch-2.7 patch applies to branch-2.7.

> NPE when getting node status of a decommissioned node after an RM restart
> -------------------------------------------------------------------------
>
>                 Key: YARN-5837
>                 URL: https://issues.apache.org/jira/browse/YARN-5837
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.3, 3.0.0-alpha1
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: YARN-5837.001.patch, YARN-5837.branch-2.7.001.patch
>
>
> If you decommission a node, the {{yarn node}} command shows it like this:
> {noformat}
> >> bin/yarn node -list -all
> 2016-11-04 08:54:37,169 INFO client.RMProxy: Connecting to ResourceManager at 
> 0.0.0.0/0.0.0.0:8032
> Total Nodes:1
>          Node-Id           Node-State Node-Http-Address       
> Number-of-Running-Containers
> 192.168.1.69:57560     DECOMMISSIONED 192.168.1.69:8042                       
>            0
> {noformat}
> And a full report like this:
> {noformat}
> >> bin/yarn node -status 192.168.1.69:57560
> 2016-11-04 08:55:08,928 INFO client.RMProxy: Connecting to ResourceManager at 
> 0.0.0.0/0.0.0.0:8032
> Node Report :
>       Node-Id : 192.168.1.69:57560
>       Rack : /default-rack
>       Node-State : DECOMMISSIONED
>       Node-Http-Address : 192.168.1.69:8042
>       Last-Health-Update : Fri 04/Nov/16 08:53:58:802PDT
>       Health-Report :
>       Containers : 0
>       Memory-Used : 0MB
>       Memory-Capacity : 8192MB
>       CPU-Used : 0 vcores
>       CPU-Capacity : 8 vcores
>       Node-Labels :
>       Resource Utilization by Node :
>       Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
> {noformat}
> If you then restart the ResourceManager, you get this report:
> {noformat}
> >> bin/yarn node -list -all
> 2016-11-04 08:57:18,512 INFO client.RMProxy: Connecting to ResourceManager at 
> 0.0.0.0/0.0.0.0:8032
> Total Nodes:4
>          Node-Id           Node-State Node-Http-Address       
> Number-of-Running-Containers
>  192.168.1.69:-1       DECOMMISSIONED   192.168.1.69:-1                       
>            0
> {noformat}
> And when you try to get the full report on the now "-1" node, you get an NPE:
> {noformat}
> >> bin/yarn node -status 192.168.1.69:-1
> 2016-11-04 08:57:57,385 INFO client.RMProxy: Connecting to ResourceManager at 
> 0.0.0.0/0.0.0.0:8032
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.client.cli.NodeCLI.printNodeStatus(NodeCLI.java:296)
>       at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:116)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:63)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-5837) NPE when getting node status of a decommissioned node after an RM restart

Reply via email to