[ 
https://issues.apache.org/jira/browse/SPARK-31208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31208.
-----------------------------------
    Fix Version/s: 3.1.0
       Resolution: Fixed

Issue resolved by pull request 28038
[https://github.com/apache/spark/pull/28038]

> Expose the ability for user to cleanup shuffle files
> ----------------------------------------------------
>
>                 Key: SPARK-31208
>                 URL: https://issues.apache.org/jira/browse/SPARK-31208
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Holden Karau
>            Assignee: Holden Karau
>            Priority: Major
>             Fix For: 3.1.0
>
>
> Dynamic scaling on Kubernetes (introduced in Spark 3) depends on only 
> shutting down executors without shuffle files. However Spark does not 
> aggressively clean up shuffle files (see SPARK-5836) and instead depends on 
> JVM GC on the driver to trigger deletes. We already have a mechanism to 
> explicitly clean up shuffle files from the ALS algorithm where we create a 
> lot of quickly orphaned shuffle files. We should expose this as an advanced 
> developer feature to enable people to better clean-up shuffle files improving 
> dynamic scaling of their jobs on Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to