Good afternoon, I'm attempting to get the wordcount example working, and I keep getting an error in the "reduceByKey(_ + _)" call. I've scoured the mailing lists, and haven't been able to find a sure fire solution, unless I'm missing something big. I did find something close, but it didn't appear to work in my case. The error is:
org.apache.spark.SparkException: Job aborted: Task 2.0:3 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: SimpleApp$$anonfun$3) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) I've commented out and re-commented in the reduceByKey line to make sure it was the cause, and it is. If I take it out, my script compiles and runs no problem. If I put it in, I get the above error across all my nodes. I've attempted to use the spark-shell, and it will actually process the line properly, so I assumed it was a missing "import" statement. The only one I could find that was anywhere close to my particular error was someone who was having the same issues, and a "import scala.collection.JavaConversions._" fixed his problem. This didn't appear to work for me. Can anyone shed some light on this, as I'm pulling out my hair trying to figure out what I'm missing. The section of code I'm trying to get to work is: val JCountRes = logData.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) "logData" is just an RDD pointing to a large (2gb) file in HDFS. Thanks, Ian