Recently I faced an issue with Spark 1.5.2 standalone. Spark does not clean
garbage in blockmgr folders on slaves until I exit from spark-shell.
I opened spark-shell and run my spark program for several input folders.
Then I noticed that Spark uses several GBs of disk space on all slaves in
blockmgr folder, e.g.
spark/spark-xxx/executor-yyy/blockmgr-zzz

Yes, I have several RDDs in memory but according to Spark UI all RDDs use
only Memory (but not disk).
RDDs are cached at the beginning of data processing and at that time
blockmgr folders are almost empty.

So, looks like the jobs which I run in shell produced some garbage in
blockmgr folders and Spark did clean the folders after the jobs are done.
If I exit from spark-shell then blockmgr folders are instantly cleaned.

How to force Spark to clean blockmgr folders without exiting from the shell?
Should I use spark.cleaner.ttl setting?

Reply via email to