RE: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

Sun, Rui Wed, 17 Dec 2014 19:21:59 -0800

Not using spark-submit. The App directly communicates with the Spark cluster in 
standalone mode.


If mark the Spark dependency as 'provided’, then the spark-core .jar elsewhere 
must be pointe to in CLASSPATH. However, the pre-built Spark binary only has an 
assembly jar, not having individual module jars. So you don’t have a chance to 
point to a module.jar which is the same binary as that in the pre-built Spark 
binary.

Maybe the Spark distribution should contain not only the assembly jar but also 
individual module jars. Any opinion?

From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
Sent: Thursday, December 18, 2014 2:20 AM
To: Sean Owen
Cc: Sun, Rui; user@spark.apache.org
Subject: Re: weird bytecode incompatability issue between spark-core jar from 
mvn repo and official spark prebuilt binary

Just to clarify, are you running the application using spark-submit after 
packaging with sbt package ? One thing that might help is to mark the Spark 
dependency as 'provided' as then you shouldn't have the Spark classes in your 
jar.

Thanks
Shivaram

On Wed, Dec 17, 2014 at 4:39 AM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
You should use the same binaries everywhere. The problem here is that
anonymous functions get compiled to different names when you build
different (potentially) so you actually have one function being called
when another function is meant.

On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui 
<rui....@intel.com<mailto:rui....@intel.com>> wrote:
> Hi,
>
>
>
> I encountered a weird bytecode incompatability issue between spark-core jar
> from mvn repo and official spark prebuilt binary.
>
>
>
> Steps to reproduce:
>
> 1.     Download the official pre-built Spark binary 1.1.1 at
> http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz
>
> 2.     Launch the Spark cluster in pseudo cluster mode
>
> 3.     A small scala APP which calls RDD.saveAsObjectFile()
>
> scalaVersion := "2.10.4"
>
>
>
> libraryDependencies ++= Seq(
>
>   "org.apache.spark" %% "spark-core" % "1.1.1"
>
> )
>
>
>
> val sc = new SparkContext(args(0), "test") //args[0] is the Spark master URI
>
>   val rdd = sc.parallelize(List(1, 2, 3))
>
>   rdd.saveAsObjectFile("/tmp/mysaoftmp")
>
>           sc.stop
>
>
>
> throws an exception as follows:
>
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
> stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost
> task 1.3 in stage 0.0 (TID 6, 
> ray-desktop.sh.intel.com<http://ray-desktop.sh.intel.com>):
> java.lang.ClassCastException: scala.Tuple2 cannot be cast to
> scala.collection.Iterator
>
> [error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>
> [error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>
> [error]
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
> [error]
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> [error]         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>
> [error]
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> [error]
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>
> [error]         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> [error]
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>
> [error]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>
> [error]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> [error]         java.lang.Thread.run(Thread.java:701)
>
>
>
> After investigation, I found that this is caused by bytecode incompatibility
> issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built spark
> assembly respectively.
>
>
>
> This issue also happens with spark 1.1.0.
>
>
>
> Is there anything wrong in my usage of Spark? Or anything wrong in the
> process of deploying Spark module jars to maven repo?
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

RE: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

Reply via email to