Jeffrey Charles created SPARK-24786: ---------------------------------------
Summary: Executors not being released after all cached data is unpersisted Key: SPARK-24786 URL: https://issues.apache.org/jira/browse/SPARK-24786 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 2.2.1 Environment: Zeppelin in EMR Reporter: Jeffrey Charles I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled to get a sense of how much memory the dataframe takes up. After I note the size, I unpersist the dataframe. For some reason, Yarn is not releasing the executors that were added to Zeppelin. If I don't run the persist and unpersist steps, the executors that were added are removed about a minute after the paragraphs complete. Looking at the storage tab in the Spark UI for the Zeppelin job, I don't see anything cached. I do not want to set spark.dynamicAllocation.cachedExecutorIdleTimeout to a lower value because I do not want executors with cached data to be released, but I do want ones that had cached data and no longer have cached data to be released. Steps to reproduce: # Enable dynamic allocation # Set spark.dynamicAllocation.executorIdleTimeout to 60s # Set spark.dynamicAllocation.cachedExecutorIdleTimeout to infinity # Load a dataset, persist it, run a count on the persisted dataset, unpersist the persisted dataset # Wait a couple minutes Expected behaviour: All executors will be released as the executors are no longer caching any data Observed behaviour: No executors were released -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org