[
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254983#comment-17254983
]
caozhiqiang commented on YARN-10501:
------------------------------------
[~ebadger], thank you. I'll try to answer your questions.
*Firstly, we use yarn rmadmin -replaceLabelsOnNode command to the first adding
labels and replace labels to host/nm, and all processes are in*
{code:java}
case REPLACE:{code}
1a. I am also have confused for adding node without port to map, It may only
want to allow users to use port 0 and show its information. In the hadoop
document, it also tell this.
[nodelabels|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeLabel.html]
{code:java}
Executing yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 node2=label2”
[-failOnUnknownNodes]. Added label1 to node1, label2 to node2. If user don’t
specify port, it adds the label to all NodeManagers running on the node. If
option -failOnUnknownNodes is set, this command will fail if specified nodes
are unknown.
{code}
1b. If user don’t specify port, it adds the label to all NodeManagers running
on the node. So host's labels are the labels for each nm in this host. And each
nm also need its labels. Another reason is when there are none nms in a host,
we can use host's labels..
{code:java}
protected Set<String> getLabelsByNode(NodeId nodeId, Map<String, Host> map) {
Host host = map.get(nodeId.getHost());
if (null == host) {
return EMPTY_STRING_SET;
}
Node nm = host.nms.get(nodeId);
if (null != nm && null != nm.labels) {
return nm.labels;
} else {
return host.labels;
}
}
{code}
1c. Before the first adding labels to host/node, the node's labels is null. In
"case ADD: ", the host should have been initalized use "case REPLACE:", I think.
2a and 2b are the same with 1.
2c and 2d. If the port is 0, host's labels are the same with nodes' labels in
this host. *and The add the host's label to nodes' labels.*
2e and 2f. *Set the Labels to Null for each Node, then set new labels to node
asynchronously with below code.*
{code:java}
// code placeholder
newNMToLabels.put(nodeId, host.labels);
...
dispatcher.getEventHandler().handle(
new UpdateNodeToLabelsMappingsEvent(newNMToLabels));
...
public Node copy() {
Node c = new Node(nodeId);
if (labels != null) {
c.labels =
Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
c.labels.addAll(labels);
} else {
c.labels = null;
}
c.resource = Resources.clone(resource);
c.running = running;
return c;
}
}{code}
Could [~leftnoteasy] , [~varunsaxena] and [~sunilg] give more information for
these processes?
> Can't remove all node labels after add node label without nodemanager port
> --------------------------------------------------------------------------
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.4.0
> Reporter: caozhiqiang
> Assignee: caozhiqiang
> Priority: Critical
> Attachments: YARN-10501.002.patch, YARN-10501.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0)
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}}}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}}}}}
> {code}
> You can see after the 4 process to remove nodemanager labels, the label info
> is still in the node info.
> {code:java}
> 641 case REPLACE:
> 642 replaceNodeForLabels(nodeId, host.labels, labels);
> 643 replaceLabelsForNode(nodeId, host.labels, labels);
> 644 host.labels.clear();
> 645 host.labels.addAll(labels);
> 646 for (Node node : host.nms.values()) {
> 647 replaceNodeForLabels(node.nodeId, node.labels, labels);
> 649 node.labels = null;
> 650 }
> 651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port
> and the real nm port with be both add to node info, and when remove labels,
> the parameter node.labels in 647 line is null, so it will not remove the old
> label.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]