[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

ASF GitHub Bot (JIRA) Wed, 15 Nov 2017 14:54:15 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254382#comment-16254382
 ]


ASF GitHub Bot commented on YARN-6483:
--------------------------------------

Github user juanrh commented on a diff in the pull request:

    https://github.com/apache/hadoop/pull/289#discussion_r151276382
  
    --- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
    @@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
           // Update NM metrics during graceful decommissioning.
           rmNode.updateMetricsForGracefulDecommission(initState, finalState);
           rmNode.decommissioningTimeout = timeout;
    +      // Notify NodesListManager to notify all RMApp so that each 
Application Master
    +      // could take any required actions.
    +      rmNode.context.getDispatcher().getEventHandler().handle(
    +          new NodesListManagerEvent(
    +              NodesListManagerEventType.NODE_USABLE, rmNode));
    --- End diff --
    
    I wasn't very sure about using `NODE_USABLE`, but while I was making the 
changes to follow your suggestion, I have found that in the current code 
[`TestRMNodeTransitions.testResourceUpdateOnDecommissioningNode`](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java#L994)
 is asserting that `NodesListManagerEventType.NODE_USABLE` is expected for a 
node that transitions to `DECOMMISSIONING`. Also, NodesListManagerEventType is 
transformed into the corresponding `RMAppNodeUpdateType` in 
`NodesListManager.handle` to build a `RMAppNodeUpdateEvent` that is processed 
in `RMAppImpl.processNodeUpdate` which just uses the `RMAppNodeUpdateType` for 
logging.
    
    So it looks like it is ok to use `NodesListManagerEventType.NODE_USABLE` 
for nodes in decommissioning state. Do you still think it's worth adding some 
additional value for `NodesListManagerEventType` and `RMAppNodeUpdateType`?


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6483
>                 URL: https://issues.apache.org/jira/browse/YARN-6483
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Juan Rodríguez Hortalá
>         Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

Reply via email to