Sean,

Yes, the problem is exactly anonymous function mis-matching as you described

So if an Spark app (driver) depends on a Spark module jar (for example 
spark-core) to programmatically communicate with a Spark cluster, user should 
not use pre-built Spark binary but build Spark from the source and publish the 
module jars into local maven repo And then build the app to make sure the 
binary is same. It makes no sense to publish Spark module jars into the central 
maven repo because binary compatibility with a Spark cluster of the same 
version is not ensured. Is my understanding correct?


-----Original Message-----
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Wednesday, December 17, 2014 8:39 PM
To: Sun, Rui
Cc: user@spark.apache.org
Subject: Re: weird bytecode incompatability issue between spark-core jar from 
mvn repo and official spark prebuilt binary

You should use the same binaries everywhere. The problem here is that anonymous 
functions get compiled to different names when you build different 
(potentially) so you actually have one function being called when another 
function is meant.

On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui <rui....@intel.com> wrote:
> Hi,
>
>
>
> I encountered a weird bytecode incompatability issue between 
> spark-core jar from mvn repo and official spark prebuilt binary.
>
>
>
> Steps to reproduce:
>
> 1.     Download the official pre-built Spark binary 1.1.1 at
> http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz
>
> 2.     Launch the Spark cluster in pseudo cluster mode
>
> 3.     A small scala APP which calls RDD.saveAsObjectFile()
>
> scalaVersion := "2.10.4"
>
>
>
> libraryDependencies ++= Seq(
>
>   "org.apache.spark" %% "spark-core" % "1.1.1"
>
> )
>
>
>
> val sc = new SparkContext(args(0), "test") //args[0] is the Spark 
> master URI
>
>   val rdd = sc.parallelize(List(1, 2, 3))
>
>   rdd.saveAsObjectFile("/tmp/mysaoftmp")
>
>           sc.stop
>
>
>
> throws an exception as follows:
>
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 1 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com):
> java.lang.ClassCastException: scala.Tuple2 cannot be cast to 
> scala.collection.Iterator
>
> [error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>
> [error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>
> [error]
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:3
> 5)
>
> [error]
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> [error]         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>
> [error]
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> [error]
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>
> [error]         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> [error]
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>
> [error]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1146)
>
> [error]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
>
> [error]         java.lang.Thread.run(Thread.java:701)
>
>
>
> After investigation, I found that this is caused by bytecode 
> incompatibility issue between RDD.class in spark-core_2.10-1.1.1.jar 
> and the pre-built spark assembly respectively.
>
>
>
> This issue also happens with spark 1.1.0.
>
>
>
> Is there anything wrong in my usage of Spark? Or anything wrong in the 
> process of deploying Spark module jars to maven repo?
>
>

Reply via email to