Ryan - I sent a PR to fix your issue: https://github.com/apache/spark/pull/6599
Edward - I have no idea why the following error happened. "ContextCleaner" doesn't use any Hadoop API. Could you try Spark 1.3.0? It's supposed to support both hadoop 1 and hadoop 2. * "Exception in thread "Spark Context Cleaner" java.lang.NoClassDefFoundError: 0 at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)" Best Regards, Shixiong Zhu 2015-06-03 0:08 GMT+08:00 Ryan Williams <ryan.blake.willi...@gmail.com>: > I think this is causing issues upgrading ADAM > <https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690 > <https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>); > attempting to build against Hadoop 1.0.4 yields errors like: > > 2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage 0.0 > (TID 0) > *java.lang.IncompatibleClassChangeError: Found class > org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected* > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) > at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-06-02 15:57:44 WARN TaskSetManager:71 - Lost task 0.0 in stage 0.0 > (TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class > org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) > at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop > 2; Spark 1.3.1 expects the interface but is getting the class. > > It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I > then need to hope that I don't exercise certain Spark code paths that run > afoul of differences between Hadoop 1 and 2; does that seem correct? > > Thanks! > > On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote: > >> I don't think any of those problems are related to Hadoop. Have you >> looked at userClassPathFirst settings? >> >> On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com> >> wrote: >> >>> Hi Sean and Ted, >>> Thanks for your replies. >>> >>> I don't have our current problems nicely written up as good questions >>> yet. I'm still sorting out classpath issues, etc. >>> In case it is of help, I'm seeing: >>> * "Exception in thread "Spark Context Cleaner" >>> java.lang.NoClassDefFoundError: 0 >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)" >>> * We've been having clashing dependencies between a colleague and I >>> because of the aforementioned classpath issue >>> * The clashing dependencies are also causing issues with what jetty >>> libraries are available in the classloader from Spark and don't clash with >>> existing libraries we have. >>> >>> More anon, >>> >>> Cheers, >>> Edward >>> >>> >>> >>> -------- Original Message -------- >>> Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38 >>> From: Sean Owen <so...@cloudera.com> To: Edward Sargisson < >>> esa...@pobox.com> Cc: user <user@spark.apache.org> >>> >>> >>> Yes, the published artifacts can only refer to one version of anything >>> (OK, modulo publishing a large number of variants under classifiers). >>> >>> You aren't intended to rely on Spark's transitive dependencies for >>> anything. Compiling against the Spark API has no relation to what >>> version of Hadoop it binds against because it's not part of any API. >>> You mark the Spark dependency even as "provided" in your build and get >>> all the Spark/Hadoop bindings at runtime from our cluster. >>> >>> What problem are you experiencing? >>> >>> >>> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com> >>> wrote: >>> >>> Hi, >>> I'd like to confirm an observation I've just made. Specifically that >>> spark >>> is only available in repo1.maven.org for one Hadoop variant. >>> >>> The Spark source can be compiled against a number of different Hadoops >>> using >>> profiles. Yay. >>> However, the spark jars in repo1.maven.org appear to be compiled >>> against one >>> specific Hadoop and no other differentiation is made. (I can see a >>> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 >>> in >>> the version I compiled locally). >>> >>> The implication here is that if you have a pom file asking for >>> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2 >>> version. Maven assumes that non-snapshot artifacts never change so >>> trying to >>> load an Hadoop 1 version will end in tears. >>> >>> This then means that if you compile code against spark-core then there >>> will >>> probably be classpath NoClassDefFound issues unless the Hadoop 2 version >>> is >>> exactly the one you want. >>> >>> Have I gotten this correct? >>> >>> It happens that our little app is using a Spark context directly from a >>> Jetty webapp and the classpath differences were/are causing some >>> confusion. >>> We are currently installing a Hadoop 1 spark master and worker. >>> >>> Thanks a lot! >>> Edward >>> >>> >>> >>> >>