I used to received the first error while I was using local multicore config. For me it was always accompanied by “TooManyFilesOpen error”. This was because the unix environment I was running my code on has a limit on the max open files for users. These files were the temporary files created by spark (assuming during shuffle operations).
Interestingly I have never got the second error yet. Also I am sorry I don’t have any solution for your problem. Maybe someone else can be a useful resource. But even I am equally curious about what happens during multicore local config of spark. Thx Vijay Gaikwad University of Washington [email protected] On Nov 29, 2013, at 9:35 AM, Matteo Ceccarello <[email protected]> wrote: > Hi all, > > I’m experiencing the following issue with spark 0.8.0-incubating. I’m running > spark in local mode on a multicore machine. As long as I use up to 16 > processors, everything works fine. If I try to run the same code with 32 > processors, I get the following two exceptions > > 18:18:32.268 [delete Spark local dirs] ERROR > org.apache.spark.storage.DiskStore - Exception while deleting local spark > dir: /ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99 > java.io.IOException: Failed to list files for dir: > /ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99 > at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:463) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:473) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:303) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:301) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.storage.DiskStore$$anon$1.run(DiskStore.scala:301) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > and > > 18:18:32.277 [spark-akka.actor.default-dispatcher-5] ERROR > o.a.spark.scheduler.local.LocalActor - key not found: 67 > java.util.NoSuchElementException: key not found: 67 > at scala.collection.MapLike$class.default(MapLike.scala:225) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at scala.collection.mutable.HashMap.default(HashMap.scala:45) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at scala.collection.MapLike$class.apply(MapLike.scala:135) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at scala.collection.mutable.HashMap.apply(HashMap.scala:45) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.scheduler.local.LocalScheduler.statusUpdate(LocalScheduler.scala:261) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:59) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:54) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.actor.Actor$class.apply(Actor.scala:318) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > org.apache.spark.scheduler.local.LocalActor.apply(LocalScheduler.scala:52) > ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.actor.ActorCell.invoke(ActorCell.scala:626) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.dispatch.Mailbox.run(Mailbox.scala:179) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > at > akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) > [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT] > I have observed that these exceptions (there are several of the second type) > occur as soon as I perform some kind of shuffle operation. That is, as soon > as a reduceByKey operation is performed, I get the exception. If I try to > partition the dataset at the very beginning of the algorithm, using either > HashPartitioner or RangePartitioner, the program fails immediately. So I > guess that it’s something related to shuffling. > > Has anyone experienced something similar? Do you have any pointers on how to > solve this problem? > > Thank you very much > Matteo >
