[ 
https://issues.apache.org/jira/browse/YARN-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Jiacheng updated YARN-10791:
---------------------------------
    Description: 
We are rolling upgrading Yarn from 2.6.0 to 3.2.1, and we met this Exception 
while we upgrading NM.

When we exclude a node and call refreshNode gracefully, All the MR AMs will 
fail.  

2021-05-28 11:36:35,790 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING 
RM. 
 java.lang.NullPointerException
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:883)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:821)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:316)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
 at java.lang.Thread.run(Thread.java:745)

The reason of this is because we gracefully decomission nodes while using 2.6MR.

handleUpdatedNodes of 2.6MR can not recognize the node state of "DECOMMISONING"

So I add a config to decide if we should send the DECOMMISONING to AMs

I don't know if it is a bug, just raise a solution for this situation

  was:
We are rolling upgrading Yarn from 2.6.0 to 3.2.1, and we met this Exception 
while we upgrading NM.

2021-05-28 11:36:35,790 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING 
RM. 
java.lang.NullPointerException
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:883)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:821)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:316)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
 at java.lang.Thread.run(Thread.java:745)

The reason of this is because we gracefully decomission nodes while using 2.6MR.

handleUpdatedNodes of 2.6MR can not recognize the node state of "DECOMMISONING"



So I add a config to decide if we should send the DECOMMISONING to AMs

I don't know if it is a bug, just raise a solution for this situation


> Graceful decomission cause NPE during Rolling upgrade from 2.6 to 3.2 
> ----------------------------------------------------------------------
>
>                 Key: YARN-10791
>                 URL: https://issues.apache.org/jira/browse/YARN-10791
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 3.2.1
>            Reporter: Song Jiacheng
>            Priority: Minor
>         Attachments: YARN-10791.v1.patch
>
>
> We are rolling upgrading Yarn from 2.6.0 to 3.2.1, and we met this Exception 
> while we upgrading NM.
> When we exclude a node and call refreshNode gracefully, All the MR AMs will 
> fail.  
> 2021-05-28 11:36:35,790 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
> CONTACTING RM. 
>  java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:883)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:821)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:316)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
>  at java.lang.Thread.run(Thread.java:745)
> The reason of this is because we gracefully decomission nodes while using 
> 2.6MR.
> handleUpdatedNodes of 2.6MR can not recognize the node state of 
> "DECOMMISONING"
> So I add a config to decide if we should send the DECOMMISONING to AMs
> I don't know if it is a bug, just raise a solution for this situation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to