Hi Dan, Spark will clean up the temp files after a run (IIRC), so you won’t see the drive out of space after the run completes. In any case, by default, Spark puts shuffles files at /tmp/ (this is controlled by the spark.local.dir parameter). I assume you’re running on EC2? You’ll probably want to override spark.local.dir to one of the /mnt*/ drives, which have much more empty space than the default shuffle directory.
Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Aug 27, 2014, at 4:12 PM, Daniil Osipov <daniil.osi...@shazam.com> wrote: > Hello, > > I've been seeing the following errors when trying to save to S3: > > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage fail > ure: Task 4058 in stage 2.1 failed 4 times, most recent failure: Lost task > 4058.3 in stag > e 2.1 (TID 12572, ip-10-81-151-40.ec2.internal): > java.io.FileNotFoundException: /mnt/spa$ > k/spark-local-20140827191008-05ae/0c/shuffle_1_7570_5768 (No space left on > device) > java.io.FileOutputStream.open(Native Method) > java.io.FileOutputStream.<init>(FileOutputStream.java:221) > > org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107) > > org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175$ > > org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuff$ > eWriter.scala:67) > > org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuff$ > eWriter.scala:65) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65$ > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > > DF tells me there is plenty of space left on the worker node: > root@ip-10-81-151-40 ~]$ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/xvda1 7.9G 4.6G 3.3G 59% / > tmpfs 7.4G 0 7.4G 0% /dev/shm > /dev/xvdb 37G 11G 25G 30% /mnt > /dev/xvdf 37G 9.5G 26G 27% /mnt2 > > Any suggestions? > Dan