Oh wow. I'll take a look tomorrow morning and see if I can figure it out. On Thu, Apr 26, 2018, 6:08 PM Josh Wills <[email protected]> wrote:
> It means that a hadoop1 dependency is getting into the jar somehow, > although it's not obvious to me how...do you have a dependency tree you can > tease apart? > > On Thu, Apr 26, 2018 at 12:17 PM, David Ortiz <[email protected]> wrote: > >> Hello, >> >> I am playing around with trying to run a mapreduce pipeline we've >> had in production for a little while on Spark. When I switch to the spark >> pipeline and try to run it, I have run into the following exception: >> >> Exception in thread "Thread-32" java.lang.IncompatibleClassChangeError: >> Found interface org.apache.hadoop.mapreduce.JobContext, but class was >> expected >> at >> org.apache.crunch.impl.mr.run.CrunchInputFormat.getSplits(CrunchInputFormat.java:44) >> at >> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:124) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> at >> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1144) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at >> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at >> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:985) >> at >> org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:800) >> at >> org.apache.crunch.impl.spark.SparkRuntime.monitorLoop(SparkRuntime.java:321) >> at >> org.apache.crunch.impl.spark.SparkRuntime.access$000(SparkRuntime.java:77) >> at >> org.apache.crunch.impl.spark.SparkRuntime$2.run(SparkRuntime.java:136) >> at java.lang.Thread.run(Thread.java:745) >> >> This happens both on EMR 5.12.0 using the 0.15.0 artifacts, as well as on >> a non-production CDH cluster running CDH 5.13.1 parcels. >> >> Any idea what would cause this? >> >> Thanks, >> Dave >> >> >> >
