Juan Rodríguez Hortalá created YARN-6483:
--------------------------------------------
Summary: Add nodes transitioning to DECOMMISSIONING state to the
list of updated nodes returned by the Resource Manager as a response to the
Application Master heartbeat
Key: YARN-6483
URL: https://issues.apache.org/jira/browse/YARN-6483
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.7.3
Reporter: Juan Rodríguez Hortalá
The DECOMMISSIONING node state is currently used as part of the graceful
decommissioning mechanism to give time for tasks to complete in a node that is
scheduled for decommission, and for reducer tasks to read the shuffle blocks in
that node. Also, YARN effectively blacklists nodes in DECOMMISSIONING state by
assigning them a capacity of 0, to prevent additional containers to be launched
in those nodes, so no more shuffle blocks are written to the node. This
blacklisting is not effective for applications like Spark, because a Spark
executor running in a YARN container will keep receiving more tasks after the
corresponding node has been blacklisted at the YARN level. We would like to
propose a modification of the YARN heartbeat mechanism so nodes transitioning
to DECOMMISSIONING are added to the list of updated nodes returned by the
Resource Manager as a response to the Application Master heartbeat. This way a
Spark application master would be able to blacklist a DECOMMISSIONING at the
Spark level.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]