Jeffrey Charles created SPARK-24786:
---------------------------------------

             Summary: Executors not being released after all cached data is 
unpersisted
                 Key: SPARK-24786
                 URL: https://issues.apache.org/jira/browse/SPARK-24786
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.2.1
         Environment: Zeppelin in EMR
            Reporter: Jeffrey Charles


I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled to 
get a sense of how much memory the dataframe takes up. After I note the size, I 
unpersist the dataframe. For some reason, Yarn is not releasing the executors 
that were added to Zeppelin. If I don't run the persist and unpersist steps, 
the executors that were added are removed about a minute after the paragraphs 
complete. Looking at the storage tab in the Spark UI for the Zeppelin job, I 
don't see anything cached. I do not want to set 
spark.dynamicAllocation.cachedExecutorIdleTimeout to a lower value because I do 
not want executors with cached data to be released, but I do want ones that had 
cached data and no longer have cached data to be released.

 

Steps to reproduce:
 # Enable dynamic allocation
 # Set spark.dynamicAllocation.executorIdleTimeout to 60s
 # Set spark.dynamicAllocation.cachedExecutorIdleTimeout to infinity
 # Load a dataset, persist it, run a count on the persisted dataset, unpersist 
the persisted dataset
 # Wait a couple minutes

Expected behaviour:

All executors will be released as the executors are no longer caching any data

Observed behaviour:

No executors were released



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to