Re: Re: spark 1.3.1 jars in repo1.maven.org

Shixiong Zhu Tue, 02 Jun 2015 17:27:07 -0700

Ryan - I sent a PR to fix your issue:
https://github.com/apache/spark/pull/6599


Edward - I have no idea why the following error happened. "ContextCleaner"
doesn't use any Hadoop API. Could you try Spark 1.3.0? It's supposed to
support both hadoop 1 and hadoop 2.

* "Exception in thread "Spark Context Cleaner"
java.lang.NoClassDefFoundError: 0
        at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"


Best Regards,
Shixiong Zhu

2015-06-03 0:08 GMT+08:00 Ryan Williams <ryan.blake.willi...@gmail.com>:

> I think this is causing issues upgrading ADAM
> <https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690
> <https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>);
> attempting to build against Hadoop 1.0.4 yields errors like:
>
> 2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage 0.0
> (TID 0)
> *java.lang.IncompatibleClassChangeError: Found class
> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected*
> at
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-02 15:57:44 WARN  TaskSetManager:71 - Lost task 0.0 in stage 0.0
> (TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class
> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
> at
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop
> 2; Spark 1.3.1 expects the interface but is getting the class.
>
> It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I
> then need to hope that I don't exercise certain Spark code paths that run
> afoul of differences between Hadoop 1 and 2; does that seem correct?
>
> Thanks!
>
> On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote:
>
>> I don't think any of those problems are related to Hadoop. Have you
>> looked at userClassPathFirst settings?
>>
>> On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com>
>> wrote:
>>
>>> Hi Sean and Ted,
>>> Thanks for your replies.
>>>
>>> I don't have our current problems nicely written up as good questions
>>> yet. I'm still sorting out classpath issues, etc.
>>> In case it is of help, I'm seeing:
>>> * "Exception in thread "Spark Context Cleaner"
>>> java.lang.NoClassDefFoundError: 0
>>>         at
>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"
>>> * We've been having clashing dependencies between a colleague and I
>>> because of the aforementioned classpath issue
>>> * The clashing dependencies are also causing issues with what jetty
>>> libraries are available in the classloader from Spark and don't clash with
>>> existing libraries we have.
>>>
>>> More anon,
>>>
>>> Cheers,
>>> Edward
>>>
>>>
>>>
>>> -------- Original Message --------
>>>  Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38
>>> From: Sean Owen <so...@cloudera.com> To: Edward Sargisson <
>>> esa...@pobox.com> Cc: user <user@spark.apache.org>
>>>
>>>
>>> Yes, the published artifacts can only refer to one version of anything
>>> (OK, modulo publishing a large number of variants under classifiers).
>>>
>>> You aren't intended to rely on Spark's transitive dependencies for
>>> anything. Compiling against the Spark API has no relation to what
>>> version of Hadoop it binds against because it's not part of any API.
>>> You mark the Spark dependency even as "provided" in your build and get
>>> all the Spark/Hadoop bindings at runtime from our cluster.
>>>
>>> What problem are you experiencing?
>>>
>>>
>>> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com>
>>> wrote:
>>>
>>> Hi,
>>> I'd like to confirm an observation I've just made. Specifically that
>>> spark
>>> is only available in repo1.maven.org for one Hadoop variant.
>>>
>>> The Spark source can be compiled against a number of different Hadoops
>>> using
>>> profiles. Yay.
>>> However, the spark jars in repo1.maven.org appear to be compiled
>>> against one
>>> specific Hadoop and no other differentiation is made. (I can see a
>>> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4
>>> in
>>> the version I compiled locally).
>>>
>>> The implication here is that if you have a pom file asking for
>>> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2
>>> version. Maven assumes that non-snapshot artifacts never change so
>>> trying to
>>> load an Hadoop 1 version will end in tears.
>>>
>>> This then means that if you compile code against spark-core then there
>>> will
>>> probably be classpath NoClassDefFound issues unless the Hadoop 2 version
>>> is
>>> exactly the one you want.
>>>
>>> Have I gotten this correct?
>>>
>>> It happens that our little app is using a Spark context directly from a
>>> Jetty webapp and the classpath differences were/are causing some
>>> confusion.
>>> We are currently installing a Hadoop 1 spark master and worker.
>>>
>>> Thanks a lot!
>>> Edward
>>>
>>>
>>>
>>>
>>

Re: Re: spark 1.3.1 jars in repo1.maven.org

Reply via email to