This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps: spark.cleaner.periodicGC.interval
spark.cleaner.referenceTracking spark.cleaner.referenceTracking.blocking.shuffle Regards Prathmesh Ranaut > On Jul 21, 2019, at 11:36 AM, Prathmesh Ranaut Gmail > <prathmesh.ran...@gmail.com> wrote: > > > This is the job of ContextCleaner. There are few a property that you can > tweak to see if that helps: > spark.cleaner.periodicGC.interval > > spark.cleaner.referenceTracking > > spark.cleaner.referenceTracking.blocking.shuffle > > > > Regards > > Prathmesh Ranaut >> On Jul 21, 2019, at 11:31 AM, Alex Landa <metalo...@gmail.com> wrote: >> >> >> Hi, >> >> We are running a long running Spark application ( which executes lots of >> quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0. >> We see that old shuffle files ( a week old for example ) are not deleted >> during the execution of the application, which leads to out of disk space >> errors on the executor. >> If we re-deploy the application, the Spark cluster take care of the cleaning >> and deletes the old shuffle data (since we have >> /-Dspark.worker.cleanup.enabled=true/ in the worker config). >> I don't want to re-deploy our app every week or two, but to be able to >> configure spark to clean old shuffle data (as it should). >> >> How can I configure Spark to delete old shuffle data during the life time of >> the application (not after)? >> >> >> Thanks, >> Alex