The next thing to check is if you are mixing versions of Scala (2.11 vs
2.12)  Or more specifically if you are compiling against a different
version than is being packaged in your assembly.

On Fri, Apr 27, 2018 at 3:02 PM, David Ortiz <[email protected]> wrote:

> Alright.  After double checking all the versions, and rebuilding as a fat
> jar, I'm now getting this.
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1708)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1696)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1695)
>         at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1695)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
>         at scala.Option.foreach(Option.scala:257)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:855)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1923)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1878)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1867)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:671)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
>         at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(
> SparkHadoopMapReduceWriter.scala:88)
>         ... 19 more
> Caused by: java.lang.AbstractMethodError: org.apache.crunch.impl.spark.
> fn.CrunchPairTuple2.call(Ljava/lang/Object;)Ljava/util/Iterator;
>         at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.
> apply(JavaRDDLike.scala:186)
>         at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.
> apply(JavaRDDLike.scala:186)
>         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:797)
>         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:797)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:87)
>         at org.apache.spark.scheduler.Task.run(Task.scala:108)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:338)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>         ... 1 more
>
> On Thu, Apr 26, 2018 at 6:54 PM David Ortiz <[email protected]> wrote:
>
>> Oh wow. I'll take a look tomorrow morning and see if I can figure it out.
>>
>> On Thu, Apr 26, 2018, 6:08 PM Josh Wills <[email protected]> wrote:
>>
>>> It means that a hadoop1 dependency is getting into the jar somehow,
>>> although it's not obvious to me how...do you have a dependency tree you can
>>> tease apart?
>>>
>>> On Thu, Apr 26, 2018 at 12:17 PM, David Ortiz <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>>      I am playing around with trying to run a mapreduce pipeline we've
>>>> had in production for a little while on Spark.  When I switch to the spark
>>>> pipeline and try to run it, I have run into the following exception:
>>>>
>>>> Exception in thread "Thread-32" java.lang.IncompatibleClassChangeError:
>>>> Found interface org.apache.hadoop.mapreduce.JobContext, but class was
>>>> expected
>>>>         at org.apache.crunch.impl.mr.run.CrunchInputFormat.getSplits(
>>>> CrunchInputFormat.java:44)
>>>>         at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
>>>> NewHadoopRDD.scala:124)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
>>>> MapPartitionsRDD.scala:35)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:239)
>>>>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
>>>> RDD.scala:237)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>>>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:
>>>> 1952)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1144)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
>>>>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>> RDDOperationScope.scala:150)
>>>>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>> RDDOperationScope.scala:111)
>>>>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>         at org.apache.spark.rdd.PairRDDFunctions.
>>>> saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
>>>>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
>>>> saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
>>>>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>> RDDOperationScope.scala:150)
>>>>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>> RDDOperationScope.scala:111)
>>>>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>         at org.apache.spark.rdd.PairRDDFunctions.
>>>> saveAsNewAPIHadoopFile(PairRDDFunctions.scala:985)
>>>>         at org.apache.spark.api.java.JavaPairRDD.
>>>> saveAsNewAPIHadoopFile(JavaPairRDD.scala:800)
>>>>         at org.apache.crunch.impl.spark.SparkRuntime.monitorLoop(
>>>> SparkRuntime.java:321)
>>>>         at org.apache.crunch.impl.spark.SparkRuntime.access$000(
>>>> SparkRuntime.java:77)
>>>>         at org.apache.crunch.impl.spark.SparkRuntime$2.run(
>>>> SparkRuntime.java:136)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> This happens both on EMR 5.12.0 using the 0.15.0 artifacts, as well as
>>>> on a non-production CDH cluster running CDH 5.13.1 parcels.
>>>>
>>>> Any idea what would cause this?
>>>>
>>>> Thanks,
>>>>      Dave
>>>>
>>>>
>>>>
>>>

Reply via email to