At first glance it looks like a huge shuffle. What operations are you invoking?
Also, is it actually filling up all available space or just filling up
the disk you configured to hold spark.local.dirs? if that's /tmp, it
could be you're just filling up the tiny tmp partition.

On Wed, Feb 11, 2015 at 3:39 AM, Peng Cheng <rhw...@gmail.com> wrote:
> I'm running a small job on a cluster with 15G of mem and 8G of disk per
> machine.
>
> The job always get into a deadlock where the last error message is:
>
> java.io.IOException: No space left on device
>       at java.io.FileOutputStream.writeBytes(Native Method)
>       at java.io.FileOutputStream.write(FileOutputStream.java:345)
>       at
> org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:86)
>       at
> org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:221)
>       at
> org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:86)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>       at
> org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:300)
>       at
> org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:247)
>       at 
> org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107)
>       at
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
>       at
> java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java:1914)
>       at
> java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1575)
>       at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
>       at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
>       at
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
>       at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$4$$anonfun$apply$2.apply(ExternalSorter.scala:751)
>       at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$4$$anonfun$apply$2.apply(ExternalSorter.scala:750)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>       at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$4.apply(ExternalSorter.scala:750)
>       at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$4.apply(ExternalSorter.scala:746)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>       at
> org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:746)
>       at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68)
>       at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>       at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:56)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
>
> By the time it happens the shuffle write size is 0.0B and input size is
> 3.4MB. I wonder what operation could quickly eat up the entire 5G free disk
> space.
>
> In addition, The storage level of the entire job is confined to
> MEMORY_ONLY_SERIALIZED and checkpointing is completely disabled.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to