Hi Michael,

I think you can set up spark.cleaner.ttl=xxx to enable time-based metadata 
cleaner, which will clean old un-used shuffle data when it is timeout.

For Spark 1.0 another way is to clean shuffle data using weak reference 
(reference tracking based, configuration is spark.cleaner.referenceTracking), 
and it is enabled by default.

Thanks
Saisai

From: Michael Chang [mailto:[email protected]]
Sent: Friday, June 13, 2014 10:15 AM
To: [email protected]
Subject: Re: Spilled shuffle files not being cleared

Bump

On Mon, Jun 9, 2014 at 3:22 PM, Michael Chang 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,

I'm seeing exceptions that look like the below in Spark 0.9.1.  It looks like 
I'm running out of inodes on my machines (I have around 300k each in a 12 
machine cluster).  I took a quick look and I'm seeing some shuffle spill files 
that are around even around 12 minutes after they are created.  Can someone 
help me understand when these shuffle spill files should be cleaned up (Is it 
as soon as they are used?)

Thanks,
Michael


java.io.FileNotFoundException: 
/mnt/var/hadoop/1/yarn/local/usercache/ubuntu/appcache/application_1399886706975_13107/spark-local-20140609210947-19e1/1c/shuffle_41637_3_0
 (No space left on device)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at 
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:118)
        at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:179)
        at 
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:164)
        at 
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
14/06/09 22:07:36 WARN TaskSetManager: Lost TID 667432 (task 86909.0:7)
14/06/09 22:07:36 WARN TaskSetManager: Loss was due to 
java.io.FileNotFoundException

Reply via email to