I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark writes to this folder. I found that the disk space of this folder keep on increase quickly and at certain point I will run out of disk space.
I wonder does spark clean up the disk spac in this folder once the shuffle operation is done? If not, I need to write a job to clean it up myself. But how do I know which sub folders there can be removed? Ningjun