Hi all,

I'm experiencing the following issue with spark 0.8.0-incubating. I'm running spark in local mode on a multicore machine. As long as I use up to 16 processors, everything works fine. If I try to run the same code with 32 processors, I get the following two exceptions

|18:18:32.268 [delete Spark local dirs] ERROR 
org.apache.spark.storage.DiskStore - Exception while deleting local spark dir: 
/ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99
java.io.IOException: Failed to list files for dir: 
/ext/ceccarel/spark-graph/tmp/spark-local-20131129181824-2d99
        at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:463) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:473) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:303)
 [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.storage.DiskStore$$anon$1$$anonfun$run$2.apply(DiskStore.scala:301)
 [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at org.apache.spark.storage.DiskStore$$anon$1.run(DiskStore.scala:301) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]|

and

|18:18:32.277 [spark-akka.actor.default-dispatcher-5] ERROR 
o.a.spark.scheduler.local.LocalActor - key not found: 67
java.util.NoSuchElementException: key not found: 67
        at scala.collection.MapLike$class.default(MapLike.scala:225) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at scala.collection.mutable.HashMap.default(HashMap.scala:45) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at scala.collection.MapLike$class.apply(MapLike.scala:135) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at scala.collection.mutable.HashMap.apply(HashMap.scala:45) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.scheduler.local.LocalScheduler.statusUpdate(LocalScheduler.scala:261)
 ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:59)
 ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.scheduler.local.LocalActor$$anonfun$receive$1.apply(LocalScheduler.scala:54)
 ~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.actor.Actor$class.apply(Actor.scala:318) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
org.apache.spark.scheduler.local.LocalActor.apply(LocalScheduler.scala:52) 
~[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.actor.ActorCell.invoke(ActorCell.scala:626) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.dispatch.Mailbox.run(Mailbox.scala:179) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at 
akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
 [spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]
        at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 
[spark-graph-assembly-0.4.0-SNAPSHOT.jar:0.4.0-SNAPSHOT]|

I have observed that these exceptions (there are several of the second type) occur as soon as I perform some kind of shuffle operation. That is, as soon as a |reduceByKey| operation is performed, I get the exception. If I try to partition the dataset at the very beginning of the algorithm, using either |HashPartitioner| or |RangePartitioner|, the program fails immediately. So I guess that it's something related to shuffling.

Has anyone experienced something similar? Do you have any pointers on how to solve this problem?

Thank you very much
Matteo

Reply via email to