[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253792#comment-17253792
 ] 

Eric Badger commented on YARN-10501:
------------------------------------

I'm looking at this patch and I have some questions about other pieces of code 
that are in the same section. I'll admit, this code is a little bit confusing 
to me because we have Hosts->Labels maps as well as Labels->Nodes maps and then 
on top of that, each Host can have multiple Nodes. Before I can comment on your 
patch, I think I need to clear up some things that are going on in this area of 
code that are confusing to me. Below is what I _think_ is happening. Feel free 
to correct me where I'm wrong.

(Assuming no port and/or wildcard port for this)
1. When we add a node label, we invoke this piece of code. 
{noformat}
        case ADD:
          addNodeToLabels(nodeId, labels);
          host.labels.addAll(labels);
          for (Node node : host.nms.values()) {
            if (node.labels != null) {
              node.labels.addAll(labels);
            }
            addNodeToLabels(node.nodeId, labels);
          }
          break;
{noformat}
  1a. This code adds the NodeId (without a port/with a wildcard port) to the 
Labels->Nodes map via addNodeToLabels. 
    *Why do we do this? There is no port associated with this node. In 1d we 
add the nodes to the map with their associated port, so I don't understand why 
we're adding the node here when it doesn't have a port.*
  1b. It adds all of the labels to the Host. This part doesn't make sense to 
me. *If we are giving Hosts the granularity to have multiple labels per host 
(due to multiple NMs), then why does the Host itself have labels?*
  1c. We add all the labels to each Node in the host, but _only_ if they 
already have labels. *Why do we only add the labels if they already have 
labels? Don't we want to add the labels regardless? Should it be possible for 
us to be in the ADD method while node.labels == null? Maybe this should throw 
an exception*
  1d. We add the Nodes (with their associated NM port) to the Labels->nodes map 
via addNodeToLabels.

2. When we replace the node label we invoke this piece of code
{noformat}
        case REPLACE:
          replaceNodeForLabels(nodeId, host.labels, labels);
          host.labels.clear();
          host.labels.addAll(labels);
          for (Node node : host.nms.values()) {
            replaceNodeForLabels(node.nodeId, node.labels, labels);
            node.labels = null;
          }
{noformat}
  2a. We remove the Node (without port or with wildcard port) from the specific 
label in the Labels->Nodes map via removeNoveFromLabels(). *Why do we have the 
node without a port in the first place?* This comes from 1a.
  2b. We add the Node (without port or with wildcard port) to the new specific 
label in the Labels->Nodes map via addNodeToLabels(). *Why do we add the node 
without a port?*
  2c. We clear the labels associated with the Host. *Why are there labels 
associated with a Host when each Host is actually a collection of Nodes?*
  2d. We add the new labels to the Host. Same question as 2c.
  2e. We iterate through the list of Nodes associated with each Host and 
perform 2a and 2b, except with Nodes that have their associated ports.
  2f. We set the Labels to Null for each Node associated with the Host. I don't 
understand the purpose of this. I must be missing something here.

Overall I have 2 main issues with the code that I need cleared up because I 
either don't understand the code or think things are broken/unnecessary. 
1) We have labels associated with Hosts (which are collections of Hosts) _and_ 
labels associated with just Nodes
2) We add Nodes that have no associated port or have the wildcard port _and_ 
add those same nodes with their associated ports.

Probably need [~leftnoteasy], [~varunsaxena], or [~sunilg] to comment on this 
since they were involved with YARN-3075 that added much of this code

> Can't remove all node labels after add node label without nodemanager port
> --------------------------------------------------------------------------
>
>                 Key: YARN-10501
>                 URL: https://issues.apache.org/jira/browse/YARN-10501
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.4.0
>            Reporter: caozhiqiang
>            Assignee: caozhiqiang
>            Priority: Critical
>         Attachments: YARN-10501.002.patch, YARN-10501.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}}}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}}}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to