GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/19267
[WIP][SPARK-20628][CORE] Blacklist nodes when they transition to DECOMMISSIONING state in YARN ## What changes were proposed in this pull request? Dynamic cluster configurations where cluster nodes are added and removed frequently are common in public cloud environments, so this has become a [problem for Spark users](https://www.trulia.com/blog/tech/aws-emr-ad-hoc-spark-development-environme... ). To cope with this we propose implementing a mechanism in the line of YARNâs support for [graceful node decommission](https://issues.apache.org/jira/browse/YARN-914 ) or [Mesos maintenance primitives](https://mesos.apache.org/documentation/latest/maintenance/ ). These changes allow cluster nodes to be transitioned to a âdecommissioningâ state, at which point no more tasks will be scheduled on the executors running on those nodes. After a configurable drain time, nodes in the âdecommissioningâ state will transition to a âdecommissionedâ state, where shuffle blocks are not available anymore. Shuffle blocks stored on nodes in the âdecommissioningâ state are available to other executors. By preventing more tasks from running on nodes in the âdecommissioningâ state we avoid creating more shuffle blocks on those nodes, as those blocks wonât be available when nodes eventually transition to the âdecommissionedâ state. We have implemented a first version of this proposal for YARN, using Sparkâs [blacklisting mechanism for task scheduling](https://issues.apache.org/jira/browse/SPARK-8425 ) âavailable at the node level since Spark 2.2.0â to ensure tasks are not scheduled on nodes in the âdecommissioningâ state. With this solution it is the cluster manager, not the Spark application, that tracks the status of the node, and handles the transition from âdecommissioningâ to âdecommissionedâ. The Spark driver simply reacts to the node state transitions. ## How was this patch tested? All functionality has been tested with unit tests, with an integration test based on the BaseYarnClusterSuite, and with manual testing on a cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/juanrh/spark SPARK-20628-yarn-decommissioning-blacklisting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19267 ---- commit 35285871a40fbf0903784dfb8dc16bd4e37062fe Author: Juan Rodriguez Hortala <hort...@amazon.com> Date: 2017-08-23T16:58:24Z Send Host status update signals to YarnSchedulerBackend, on Yarn node state changes commit 42891dafcfe7d25026719b6025c1272e7fe2d947 Author: Juan Rodriguez Hortala <hort...@amazon.com> Date: 2017-08-23T18:49:04Z Add mechanism to Blacklist/Unblacklist nodes based on Node status changes in Cluster Manager commit 0c840c74a85012a4d91e349ae4830455ef3d680b Author: Juan Rodriguez Hortala <hort...@amazon.com> Date: 2017-08-23T19:56:31Z Add configuration to independently enable/disable task execution blacklisting and decommissioning blacklisting commit f9fdfb01ac2486cc268b13568d129f926b3b8ab2 Author: Juan Rodriguez Hortala <hort...@amazon.com> Date: 2017-08-23T20:29:49Z Integration test for Yarn node decommissioning ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org