Re: Re: spark 1.3.1 jars in repo1.maven.org

Sean Owen Tue, 02 Jun 2015 17:49:17 -0700

We are having a separate discussion about this but, I don't understand why
this is a problem? You're supposed to build Spark for Hadoop 1 if you run
it on Hadoop 1 and I am not sure that is happening here, given the error. I
do not think this should change as I do not see that it fixes something.


Let's please concentrate the follow up on the JIRA since you already made
one.

On Wed, Jun 3, 2015 at 2:26 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:

> Ryan - I sent a PR to fix your issue:
> https://github.com/apache/spark/pull/6599
>
> Edward - I have no idea why the following error happened. "ContextCleaner"
> doesn't use any Hadoop API. Could you try Spark 1.3.0? It's supposed to
> support both hadoop 1 and hadoop 2.
>
> * "Exception in thread "Spark Context Cleaner"
> java.lang.NoClassDefFoundError: 0
>         at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"
>
>
> Best Regards,
> Shixiong Zhu
>
> 2015-06-03 0:08 GMT+08:00 Ryan Williams <ryan.blake.willi...@gmail.com>:
>
>> I think this is causing issues upgrading ADAM
>> <https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690
>> <https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>);
>> attempting to build against Hadoop 1.0.4 yields errors like:
>>
>> 2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage
>> 0.0 (TID 0)
>> *java.lang.IncompatibleClassChangeError: Found class
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected*
>> at
>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
>> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2015-06-02 15:57:44 WARN  TaskSetManager:71 - Lost task 0.0 in stage 0.0
>> (TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
>> at
>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
>> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop
>> 2; Spark 1.3.1 expects the interface but is getting the class.
>>
>> It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I
>> then need to hope that I don't exercise certain Spark code paths that run
>> afoul of differences between Hadoop 1 and 2; does that seem correct?
>>
>> Thanks!
>>
>> On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote:
>>
>>> I don't think any of those problems are related to Hadoop. Have you
>>> looked at userClassPathFirst settings?
>>>
>>> On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sean and Ted,
>>>> Thanks for your replies.
>>>>
>>>> I don't have our current problems nicely written up as good questions
>>>> yet. I'm still sorting out classpath issues, etc.
>>>> In case it is of help, I'm seeing:
>>>> * "Exception in thread "Spark Context Cleaner"
>>>> java.lang.NoClassDefFoundError: 0
>>>>         at
>>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"
>>>> * We've been having clashing dependencies between a colleague and I
>>>> because of the aforementioned classpath issue
>>>> * The clashing dependencies are also causing issues with what jetty
>>>> libraries are available in the classloader from Spark and don't clash with
>>>> existing libraries we have.
>>>>
>>>> More anon,
>>>>
>>>> Cheers,
>>>> Edward
>>>>
>>>>
>>>>
>>>> -------- Original Message --------
>>>>  Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20
>>>> 00:38 From: Sean Owen <so...@cloudera.com> To: Edward Sargisson <
>>>> esa...@pobox.com> Cc: user <user@spark.apache.org>
>>>>
>>>>
>>>> Yes, the published artifacts can only refer to one version of anything
>>>> (OK, modulo publishing a large number of variants under classifiers).
>>>>
>>>> You aren't intended to rely on Spark's transitive dependencies for
>>>> anything. Compiling against the Spark API has no relation to what
>>>> version of Hadoop it binds against because it's not part of any API.
>>>> You mark the Spark dependency even as "provided" in your build and get
>>>> all the Spark/Hadoop bindings at runtime from our cluster.
>>>>
>>>> What problem are you experiencing?
>>>>
>>>>
>>>> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>> I'd like to confirm an observation I've just made. Specifically that
>>>> spark
>>>> is only available in repo1.maven.org for one Hadoop variant.
>>>>
>>>> The Spark source can be compiled against a number of different Hadoops
>>>> using
>>>> profiles. Yay.
>>>> However, the spark jars in repo1.maven.org appear to be compiled
>>>> against one
>>>> specific Hadoop and no other differentiation is made. (I can see a
>>>> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4
>>>> in
>>>> the version I compiled locally).
>>>>
>>>> The implication here is that if you have a pom file asking for
>>>> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2
>>>> version. Maven assumes that non-snapshot artifacts never change so
>>>> trying to
>>>> load an Hadoop 1 version will end in tears.
>>>>
>>>> This then means that if you compile code against spark-core then there
>>>> will
>>>> probably be classpath NoClassDefFound issues unless the Hadoop 2
>>>> version is
>>>> exactly the one you want.
>>>>
>>>> Have I gotten this correct?
>>>>
>>>> It happens that our little app is using a Spark context directly from a
>>>> Jetty webapp and the classpath differences were/are causing some
>>>> confusion.
>>>> We are currently installing a Hadoop 1 spark master and worker.
>>>>
>>>> Thanks a lot!
>>>> Edward
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: Re: spark 1.3.1 jars in repo1.maven.org

Reply via email to