We are having a separate discussion about this but, I don't understand why this is a problem? You're supposed to build Spark for Hadoop 1 if you run it on Hadoop 1 and I am not sure that is happening here, given the error. I do not think this should change as I do not see that it fixes something.
Let's please concentrate the follow up on the JIRA since you already made one. On Wed, Jun 3, 2015 at 2:26 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > Ryan - I sent a PR to fix your issue: > https://github.com/apache/spark/pull/6599 > > Edward - I have no idea why the following error happened. "ContextCleaner" > doesn't use any Hadoop API. Could you try Spark 1.3.0? It's supposed to > support both hadoop 1 and hadoop 2. > > * "Exception in thread "Spark Context Cleaner" > java.lang.NoClassDefFoundError: 0 > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)" > > > Best Regards, > Shixiong Zhu > > 2015-06-03 0:08 GMT+08:00 Ryan Williams <ryan.blake.willi...@gmail.com>: > >> I think this is causing issues upgrading ADAM >> <https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690 >> <https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>); >> attempting to build against Hadoop 1.0.4 yields errors like: >> >> 2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage >> 0.0 (TID 0) >> *java.lang.IncompatibleClassChangeError: Found class >> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected* >> at >> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) >> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:64) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> 2015-06-02 15:57:44 WARN TaskSetManager:71 - Lost task 0.0 in stage 0.0 >> (TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class >> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected >> at >> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) >> at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:64) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop >> 2; Spark 1.3.1 expects the interface but is getting the class. >> >> It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I >> then need to hope that I don't exercise certain Spark code paths that run >> afoul of differences between Hadoop 1 and 2; does that seem correct? >> >> Thanks! >> >> On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote: >> >>> I don't think any of those problems are related to Hadoop. Have you >>> looked at userClassPathFirst settings? >>> >>> On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com> >>> wrote: >>> >>>> Hi Sean and Ted, >>>> Thanks for your replies. >>>> >>>> I don't have our current problems nicely written up as good questions >>>> yet. I'm still sorting out classpath issues, etc. >>>> In case it is of help, I'm seeing: >>>> * "Exception in thread "Spark Context Cleaner" >>>> java.lang.NoClassDefFoundError: 0 >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)" >>>> * We've been having clashing dependencies between a colleague and I >>>> because of the aforementioned classpath issue >>>> * The clashing dependencies are also causing issues with what jetty >>>> libraries are available in the classloader from Spark and don't clash with >>>> existing libraries we have. >>>> >>>> More anon, >>>> >>>> Cheers, >>>> Edward >>>> >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 >>>> 00:38 From: Sean Owen <so...@cloudera.com> To: Edward Sargisson < >>>> esa...@pobox.com> Cc: user <user@spark.apache.org> >>>> >>>> >>>> Yes, the published artifacts can only refer to one version of anything >>>> (OK, modulo publishing a large number of variants under classifiers). >>>> >>>> You aren't intended to rely on Spark's transitive dependencies for >>>> anything. Compiling against the Spark API has no relation to what >>>> version of Hadoop it binds against because it's not part of any API. >>>> You mark the Spark dependency even as "provided" in your build and get >>>> all the Spark/Hadoop bindings at runtime from our cluster. >>>> >>>> What problem are you experiencing? >>>> >>>> >>>> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com> >>>> wrote: >>>> >>>> Hi, >>>> I'd like to confirm an observation I've just made. Specifically that >>>> spark >>>> is only available in repo1.maven.org for one Hadoop variant. >>>> >>>> The Spark source can be compiled against a number of different Hadoops >>>> using >>>> profiles. Yay. >>>> However, the spark jars in repo1.maven.org appear to be compiled >>>> against one >>>> specific Hadoop and no other differentiation is made. (I can see a >>>> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 >>>> in >>>> the version I compiled locally). >>>> >>>> The implication here is that if you have a pom file asking for >>>> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2 >>>> version. Maven assumes that non-snapshot artifacts never change so >>>> trying to >>>> load an Hadoop 1 version will end in tears. >>>> >>>> This then means that if you compile code against spark-core then there >>>> will >>>> probably be classpath NoClassDefFound issues unless the Hadoop 2 >>>> version is >>>> exactly the one you want. >>>> >>>> Have I gotten this correct? >>>> >>>> It happens that our little app is using a Spark context directly from a >>>> Jetty webapp and the classpath differences were/are causing some >>>> confusion. >>>> We are currently installing a Hadoop 1 spark master and worker. >>>> >>>> Thanks a lot! >>>> Edward >>>> >>>> >>>> >>>> >>> >