Re: Exception thrown during shuffle operations

Vijay Gaikwad Sat, 30 Nov 2013 03:42:41 -0800

I used to received the first error while I was using local multicore config. 
For me it was always accompanied by “TooManyFilesOpen error”. This was because 
the unix environment I was running my code on has a limit on the max open files 
for users. These files were the temporary files created by spark (assuming 
during shuffle operations).


Interestingly I have never got the second error yet. Also I am sorry I don’t 
have any solution for your problem. Maybe someone else can be a useful resource.
But even I am equally curious about what happens during multicore local config 
of spark.

Thx

Vijay Gaikwad
University of Washington
[email protected]

On Nov 29, 2013, at 9:35 AM, Matteo Ceccarello <[email protected]> 
wrote:

> Hi all,
> 
> I’m experiencing the following issue with spark 0.8.0-incubating. I’m running 
> spark in local mode on a multicore machine. As long as I use up to 16 
> processors, everything works fine. If I try to run the same code with 32 
> processors, I get the following two exceptions
> 
> 18:18:32.268 [delete Spark local dirs] ERROR 
> org.apache.spark.storage.DiskStore - Exception while deleting local spark 
> dir: /ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99
> java.io.IOException: Failed to list files for dir: 
> /ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99
>         at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:463) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:473) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:303)
>  [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:301)
>  [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
>  [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.storage.DiskStore$$anon$1.run(DiskStore.scala:301) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
> and
> 
> 18:18:32.277 [spark-akka.actor.default-dispatcher-5] ERROR 
> o.a.spark.scheduler.local.LocalActor - key not found: 67
> java.util.NoSuchElementException: key not found: 67
>         at scala.collection.MapLike$class.default(MapLike.scala:225) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at scala.collection.mutable.HashMap.default(HashMap.scala:45) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at scala.collection.MapLike$class.apply(MapLike.scala:135) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:45) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.scheduler.local.LocalScheduler.statusUpdate(LocalScheduler.scala:261)
>  ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:59)
>  ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:54)
>  ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.actor.Actor$class.apply(Actor.scala:318) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> org.apache.spark.scheduler.local.LocalActor.apply(LocalScheduler.scala:52) 
> ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.actor.ActorCell.invoke(ActorCell.scala:626) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.dispatch.Mailbox.run(Mailbox.scala:179) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
>  [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
>         at 
> akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 
> [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
> I have observed that these exceptions (there are several of the second type) 
> occur as soon as I perform some kind of shuffle operation. That is, as soon 
> as a reduceByKey operation is performed, I get the exception. If I try to 
> partition the dataset at the very beginning of the algorithm, using either 
> HashPartitioner or RangePartitioner, the program fails immediately. So I 
> guess that it’s something related to shuffling.
> 
> Has anyone experienced something similar? Do you have any pointers on how to 
> solve this problem?
> 
> Thank you very much
> Matteo
>

Re: Exception thrown during shuffle operations

Reply via email to