Would that throw the AbstractMethodError on a crunch framework method though?
On Fri, Apr 27, 2018 at 4:06 PM Micah Whitacre <[email protected]> wrote: > The next thing to check is if you are mixing versions of Scala (2.11 vs > 2.12) Or more specifically if you are compiling against a different > version than is being packaged in your assembly. > > On Fri, Apr 27, 2018 at 3:02 PM, David Ortiz <[email protected]> wrote: > >> Alright. After double checking all the versions, and rebuilding as a fat >> jar, I'm now getting this. >> >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1708) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1696) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1695) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1695) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855) >> at scala.Option.foreach(Option.scala:257) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:855) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1923) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1878) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1867) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at >> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:671) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) >> at org.apache.spark.internal.io >> .SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:88) >> ... 19 more >> Caused by: java.lang.AbstractMethodError: >> org.apache.crunch.impl.spark.fn.CrunchPairTuple2.call(Ljava/lang/Object;)Ljava/util/Iterator; >> at >> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) >> at >> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) >> at org.apache.spark.scheduler.Task.run(Task.scala:108) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> ... 1 more >> >> On Thu, Apr 26, 2018 at 6:54 PM David Ortiz <[email protected]> wrote: >> >>> Oh wow. I'll take a look tomorrow morning and see if I can figure it >>> out. >>> >>> On Thu, Apr 26, 2018, 6:08 PM Josh Wills <[email protected]> wrote: >>> >>>> It means that a hadoop1 dependency is getting into the jar somehow, >>>> although it's not obvious to me how...do you have a dependency tree you can >>>> tease apart? >>>> >>>> On Thu, Apr 26, 2018 at 12:17 PM, David Ortiz <[email protected]> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am playing around with trying to run a mapreduce pipeline we've >>>>> had in production for a little while on Spark. When I switch to the spark >>>>> pipeline and try to run it, I have run into the following exception: >>>>> >>>>> Exception in thread "Thread-32" >>>>> java.lang.IncompatibleClassChangeError: Found interface >>>>> org.apache.hadoop.mapreduce.JobContext, but class was expected >>>>> at >>>>> org.apache.crunch.impl.mr.run.CrunchInputFormat.getSplits(CrunchInputFormat.java:44) >>>>> at >>>>> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:124) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>>>> at >>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1144) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >>>>> at >>>>> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:985) >>>>> at >>>>> org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:800) >>>>> at >>>>> org.apache.crunch.impl.spark.SparkRuntime.monitorLoop(SparkRuntime.java:321) >>>>> at >>>>> org.apache.crunch.impl.spark.SparkRuntime.access$000(SparkRuntime.java:77) >>>>> at >>>>> org.apache.crunch.impl.spark.SparkRuntime$2.run(SparkRuntime.java:136) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> This happens both on EMR 5.12.0 using the 0.15.0 artifacts, as well as >>>>> on a non-production CDH cluster running CDH 5.13.1 parcels. >>>>> >>>>> Any idea what would cause this? >>>>> >>>>> Thanks, >>>>> Dave >>>>> >>>>> >>>>> >>>> >
