Hi, We are running a long running Spark application ( which executes lots of quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0. We see that old shuffle files ( a week old for example ) are not deleted during the execution of the application, which leads to out of disk space errors on the executor. If we re-deploy the application, the Spark cluster take care of the cleaning and deletes the old shuffle data (since we have /-Dspark.worker.cleanup.enabled=true/ in the worker config). I don't want to re-deploy our app every week or two, but to be able to configure spark to clean old shuffle data (as it should).
How can I configure Spark to delete old shuffle data during the life time of the application (not after)? Thanks, Alex