[
https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064513#comment-15064513
]
Wangda Tan commented on YARN-4454:
----------------------------------
[~bibinchundatt], thanks for reporting and looking at the issue.
The root cause of this issue is, when the RM restart first time, it will
generate a mirror file which has a complete node->label mappings:
{code}
node1:port=x
node1=y
{code}
And when we restart the RM again, we will load the mapping, but node1:port
loaded first, so node1=y will overwrite the previous one.
In:
{{org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode}}
Instead of directly iterate the map:
{code}
for (Entry<NodeId, Set<String>> entry : replaceLabelsToNode.entrySet()) {
NodeId nodeId = entry.getKey();
{code}
We should sort the map so that the node without port should be handled first
before node with port specified to avoid overwriting happens.
Is it make sense to you?
> NM to nodelabel mapping going wrong after RM restart
> ----------------------------------------------------
>
> Key: YARN-4454
> URL: https://issues.apache.org/jira/browse/YARN-4454
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: test.patch
>
>
> *Nodelabel mapping with NodeManager is going wrong if combination of
> hostname and then NodeId is used to update nodelabel mapping*
> *Steps to reproduce*
> 1.Create cluster with 2 NM
> 2.Add label X,Y to cluster
> 3.replace Label of node 1 using <HOSTNAME1:PORT>,x
> 4.replace label for node 1 by <HOSTNAME1>,y
> 5.Again replace label of node 1 by <HOSTNAME1:PORT>,x
> Check cluster label mapping HOSTNAME1 will be mapped with X
> Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y
> {noformat}
> 2015-12-14 17:17:54,901 INFO
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels:
> [<ResourcePool_1:exclusivity=true>,<ResourcePool_null:exclusivity=true>]
> 2015-12-14 17:17:54,905 INFO
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on
> nodes:
> 2015-12-14 17:17:54,906 INFO
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:
> NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
> 2015-12-14 17:17:54,906 INFO
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:
> NM=host-10-19-92-188:0, labels=[ResourcePool_null]
> 2015-12-14 17:17:54,906 INFO
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:
> NM=host-10-19-92-187:64318, labels=[ResourcePool_null]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)