Junping Du commented on YARN-3223:

Thanks [~brookz] for updating the patch! I just finish my review, here is 
1. RMNodeImpl.java,
+  /**
+   * Set total resource for the node.
+   * @param totalResource {@link Resource} New total resource
+   */
+  void setTotalResource(Resource totalResource);
We should use RMNodeResourceUpdateEvent to update resource instead of 
manipulating resource on RMNode directly. Actually, a very similar interface 
(only name different) was added in YARN-311 but get removed in YARN-312 because 
RMNode is supposed to be a read-only interface, and all internal state changes 
are triggered by event instead of directly call. Please refer a similar 
comments here: 

2. AbstractYarnScheduler.java
+  /**
+   * Update the cluster resource based on a particular node's changes.
+   */
+  private synchronized void updateClusterResources(RMNode nm,
+      SchedulerNode node, Resource oldResource, Resource newResource) {
+    // Notify NodeLabelsManager about this change
+    rmContext.getNodeLabelManager().updateNodeResource(nm.getNodeID(),
+        newResource);
+    // Log resource change
+    LOG.info("Update resource on node: " + node.getNodeName()
+        + " from: " + oldResource + ", to: "
+        + newResource);
+    nodes.remove(nm.getNodeID());
+    updateMaximumAllocation(node, false);
+    // if node is decommissioning, preserve original total resource
+    if (nm.getState() == NodeState.DECOMMISSIONING){
+      node.setOriginalTotalResource(oldResource);
+    }
+    // update resource to node
+    node.setTotalResource(newResource);
+    // Update RMNode
+    nm.setTotalResource(newResource);
+    nodes.put(nm.getNodeID(), (N)node);
+    updateMaximumAllocation(node, true);
+    // update resource to clusterResource
+    Resources.subtractFrom(clusterResource, oldResource);
+    Resources.addTo(clusterResource, newResource);
+  }
Instead of set new resource directly on RMNode and SchedulerNode, we should add 
a new transition (can combine with UpdateNodeResourceWhenRunningTransition) for 
RMNodeImpl to handle RESOURCE_UPDATE for node in decommissioning stage. It will 
handle scheduler node resource update and cluster resource update there (also 
some other logic we could be missing here). For recover the previous capacity 
after recommission, I would prefer we keep original resource in RMNode side 
instead of SchedulerNode as the value more belongs to RMNode itself and the 
state transition from running to decommissioning will take a snapshot of 
resource at that moment. The original resource can be updated if node resource 
update trigger from CLI but not when some running container get finished.

BTW, it would be nice if we have tests to cover key scenarios below: 
1. After a NM with running containers is put into decommissioning stage, it is 
total capacity can be updated with its used capacity, but original capacity is 
2. The total capacity get updated when some running container get finished.
3. After recommissioning, the total capacity of NM can back to original 

> Resource update during NM graceful decommission
> -----------------------------------------------
>                 Key: YARN-3223
>                 URL: https://issues.apache.org/jira/browse/YARN-3223
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Junping Du
>            Assignee: Brook Zhou
>         Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.

This message was sent by Atlassian JIRA

Reply via email to