Hi Deb, If you don't have long-running Spark applications (those taking more than spark.worker.cleanup.appDataTtl) then the TTL-based cleaner is a good solution. If however you have a mix of long-running and short-running applications, then the TTL-based solution will fail. It will clean up data from applications that are still running, which causes problems.
http://spark.apache.org/docs/latest/spark-standalone.html To make this work properly, we need the worker directory cleanup to only clean up directories from terminated applications and leave directories from running applications regardless of their age. This is tracked at https://issues.apache.org/jira/browse/SPARK-1860 On Wed, Aug 13, 2014 at 9:47 PM, Debasish Das <debasish.da...@gmail.com> wrote: > Hi, > > I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark > can use more shuffle space... > > Does Spark cleans all the shuffle files once the runs are done ? Seems to > me that the shuffle files are not cleaned... > > Do I need to set this variable ? spark.cleaner.ttl > > Right now we are planning to use logRotate to clean up the shuffle > files..is that a good practice ? > > Thanks. > Deb > >