Sean, Yes, the problem is exactly anonymous function mis-matching as you described
So if an Spark app (driver) depends on a Spark module jar (for example spark-core) to programmatically communicate with a Spark cluster, user should not use pre-built Spark binary but build Spark from the source and publish the module jars into local maven repo And then build the app to make sure the binary is same. It makes no sense to publish Spark module jars into the central maven repo because binary compatibility with a Spark cluster of the same version is not ensured. Is my understanding correct? -----Original Message----- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, December 17, 2014 8:39 PM To: Sun, Rui Cc: user@spark.apache.org Subject: Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary You should use the same binaries everywhere. The problem here is that anonymous functions get compiled to different names when you build different (potentially) so you actually have one function being called when another function is meant. On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui <rui....@intel.com> wrote: > Hi, > > > > I encountered a weird bytecode incompatability issue between > spark-core jar from mvn repo and official spark prebuilt binary. > > > > Steps to reproduce: > > 1. Download the official pre-built Spark binary 1.1.1 at > http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz > > 2. Launch the Spark cluster in pseudo cluster mode > > 3. A small scala APP which calls RDD.saveAsObjectFile() > > scalaVersion := "2.10.4" > > > > libraryDependencies ++= Seq( > > "org.apache.spark" %% "spark-core" % "1.1.1" > > ) > > > > val sc = new SparkContext(args(0), "test") //args[0] is the Spark > master URI > > val rdd = sc.parallelize(List(1, 2, 3)) > > rdd.saveAsObjectFile("/tmp/mysaoftmp") > > sc.stop > > > > throws an exception as follows: > > [error] (run-main-0) org.apache.spark.SparkException: Job aborted due > to stage failure: Task 1 in stage 0.0 failed 4 times, most recent > failure: Lost task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com): > java.lang.ClassCastException: scala.Tuple2 cannot be cast to > scala.collection.Iterator > > [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > [error] > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:3 > 5) > > [error] > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > [error] org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > > [error] > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > [error] > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > > [error] org.apache.spark.scheduler.Task.run(Task.scala:54) > > [error] > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > > [error] > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > ava:1146) > > [error] > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > java:615) > > [error] java.lang.Thread.run(Thread.java:701) > > > > After investigation, I found that this is caused by bytecode > incompatibility issue between RDD.class in spark-core_2.10-1.1.1.jar > and the pre-built spark assembly respectively. > > > > This issue also happens with spark 1.1.0. > > > > Is there anything wrong in my usage of Spark? Or anything wrong in the > process of deploying Spark module jars to maven repo? > >