Brad created SPARK-21097:
----------------------------

             Summary: Dynamic allocation will preserve cached data
                 Key: SPARK-21097
                 URL: https://issues.apache.org/jira/browse/SPARK-21097
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager, Scheduler, Spark Core
    Affects Versions: 2.2.0, 2.3.0
            Reporter: Brad


We want to use dynamic allocation to distribute resources among many notebook 
users on our spark clusters. One difficulty is that if a user has cached data 
then we are either prevented from de-allocating any of their executors, or we 
are forced to drop their cached data, which can lead to a bad user experience.

We propose adding a feature to preserve cached data by copying it to other 
executors before de-allocation. This behavior would be enabled by a simple 
spark config like "spark.dynamicAllocation.recoverCachedData". Now when an 
executor reaches its configured idle timeout, instead of just killing it on the 
spot, we will stop sending it new tasks, replicate all of its rdd blocks onto 
other executors, and then kill it. If there is an issue while we replicate the 
data, like an error, it takes too long, or there isn't enough space, then we 
will fall back to the original behavior and drop the data and kill the executor.

This feature should allow anyone with notebook users to use their cluster 
resources more efficiently. Also since it will be completely opt-in it will 
unlikely to cause problems for other use cases. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to