[
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281350#comment-17281350
]
Eric Badger commented on YARN-10501:
------------------------------------
[~caozhiqiang], I have some comments on the patch as it is.
bq. + throw new IOException("Should create host before add labels.");
Grammar nit: This should be something like "Cannot add labels to a host that
does not exist. Create the host before adding labels to it"
bq. + throw new IOException("Should create host before remove labels.");
Grammar nit: This should be something like "Cannot remove labels from a host
that does not exist. Create the host before adding labels to it"
The unit test does not fail without your patch. Unit tests should fail without
your patch and then succeed with your patch or else the test doesn't actually
prove that your patch did anything. From the looks of the test, you are only
adding a node label to a single node, not multiple. So the broken functionality
is not tested by your unit test
> Can't remove all node labels after add node label without nodemanager port
> --------------------------------------------------------------------------
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.4.0
> Reporter: caozhiqiang
> Assignee: caozhiqiang
> Priority: Critical
> Attachments: YARN-10501.002.patch, YARN-10501.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0)
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}}}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}}}}}
> {code}
> You can see after the 4 process to remove nodemanager labels, the label info
> is still in the node info.
> {code:java}
> 641 case REPLACE:
> 642 replaceNodeForLabels(nodeId, host.labels, labels);
> 643 replaceLabelsForNode(nodeId, host.labels, labels);
> 644 host.labels.clear();
> 645 host.labels.addAll(labels);
> 646 for (Node node : host.nms.values()) {
> 647 replaceNodeForLabels(node.nodeId, node.labels, labels);
> 649 node.labels = null;
> 650 }
> 651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port
> and the real nm port with be both add to node info, and when remove labels,
> the parameter node.labels in 647 line is null, so it will not remove the old
> label.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]