[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064513#comment-15064513 ]
Wangda Tan commented on YARN-4454: ---------------------------------- [~bibinchundatt], thanks for reporting and looking at the issue. The root cause of this issue is, when the RM restart first time, it will generate a mirror file which has a complete node->label mappings: {code} node1:port=x node1=y {code} And when we restart the RM again, we will load the mapping, but node1:port loaded first, so node1=y will overwrite the previous one. In: {{org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode}} Instead of directly iterate the map: {code} for (Entry<NodeId, Set<String>> entry : replaceLabelsToNode.entrySet()) { NodeId nodeId = entry.getKey(); {code} We should sort the map so that the node without port should be handled first before node with port specified to avoid overwriting happens. Is it make sense to you? > NM to nodelabel mapping going wrong after RM restart > ---------------------------------------------------- > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Priority: Critical > Attachments: test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using <HOSTNAME1:PORT>,x > 4.replace label for node 1 by <HOSTNAME1>,y > 5.Again replace label of node 1 by <HOSTNAME1:PORT>,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [<ResourcePool_1:exclusivity=true>,<ResourcePool_null:exclusivity=true>] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)