Wangda Tan commented on YARN-4454:

[~bibinchundatt], thanks for reporting and looking at the issue. 

The root cause of this issue is, when the RM restart first time, it will 
generate a mirror file which has a complete node->label mappings:

And when we restart the RM again, we will load the mapping, but node1:port 
loaded first, so node1=y will overwrite the previous one.


Instead of directly iterate the map:
    for (Entry<NodeId, Set<String>> entry : replaceLabelsToNode.entrySet()) {
      NodeId nodeId = entry.getKey();
We should sort the map so that the node without port should be handled first 
before node with port specified to avoid overwriting happens.

Is it make sense to you?

> NM to nodelabel mapping going wrong after RM restart
> ----------------------------------------------------
>                 Key: YARN-4454
>                 URL: https://issues.apache.org/jira/browse/YARN-4454
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: test.patch
> *Nodelabel mapping with NodeManager  is going wrong if combination of 
> hostname and then NodeId is used to update nodelabel mapping*
> *Steps to reproduce*
> 1.Create cluster with 2 NM
> 2.Add label X,Y to cluster
> 3.replace  Label of node  1 using <HOSTNAME1:PORT>,x
> 4.replace label for node 1 by <HOSTNAME1>,y
> 5.Again replace label of node 1 by <HOSTNAME1:PORT>,x
> Check cluster label mapping HOSTNAME1 will be mapped with X 
> Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y
> {noformat}
> 2015-12-14 17:17:54,901 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
> [<ResourcePool_1:exclusivity=true>,<ResourcePool_null:exclusivity=true>]
> 2015-12-14 17:17:54,905 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on 
> nodes:
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:0, labels=[ResourcePool_null]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-187:64318, labels=[ResourcePool_null]
> {noformat}

This message was sent by Atlassian JIRA

Reply via email to