GitHub user dhruve opened a pull request:
https://github.com/apache/spark/pull/22015
[SPARK-20286] Release executors on unpersisting RDD
## What changes were proposed in this pull request?
Currently, the executors acquired using dynamic allocation are not released
when the cached RDD is unpersisted. This leads to wasting unnecessary grid
resources. With this change, once the cached RDD is unpersisted, we check if
the executor has any running tasks or not. If not then we do the following:
1 - If the executor has cached RDD blocks from other RDDs, we don't make
any change.
2 - If the executor has no more cached RDD blocks and tasks running, we
update the removal time based on the conf
`spark.dynamicAllocation.cachedExecutorIdleTimeout` so the idle executor can be
released back.
## How was this patch tested?
Manually using a code snippet.
``` scala
val rdd = sc.textFile("smallFile")
rdd.cache
val rdd2 = sc.textFile("largeFile")
rdd2.cache
rdd2.count // Cached data on around 500+ executors
Thread.sleep(30000) // sleep for 30s
rdd.count // Cached data on around 20 executors
// Verify only 20 executors remain, rest will timeout based on idleTimeout
which i set to 60s
rdd2.unpersist
// eventunally all executors will be released as there are no tasks
running on any executor.
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhruve/spark bug/SPARK-20286
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22015.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22015
----
commit 7e229dc120d0f1542cc8d7dbac1027baac36665e
Author: Dhruve Ashar <dhruveashar@...>
Date: 2018-08-06T20:32:47Z
[SPARK-20286] Release executors on unpersisting RDD
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]