It means that a hadoop1 dependency is getting into the jar somehow,
although it's not obvious to me how...do you have a dependency tree you can
tease apart?

On Thu, Apr 26, 2018 at 12:17 PM, David Ortiz <[email protected]> wrote:

> Hello,
>
>      I am playing around with trying to run a mapreduce pipeline we've had
> in production for a little while on Spark.  When I switch to the spark
> pipeline and try to run it, I have run into the following exception:
>
> Exception in thread "Thread-32" java.lang.IncompatibleClassChangeError:
> Found interface org.apache.hadoop.mapreduce.JobContext, but class was
> expected
>         at org.apache.crunch.impl.mr.run.CrunchInputFormat.getSplits(
> CrunchInputFormat.java:44)
>         at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
> NewHadoopRDD.scala:124)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1144)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>         at org.apache.spark.rdd.PairRDDFunctions.
> saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$
> saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>         at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(
> PairRDDFunctions.scala:985)
>         at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(
> JavaPairRDD.scala:800)
>         at org.apache.crunch.impl.spark.SparkRuntime.monitorLoop(
> SparkRuntime.java:321)
>         at org.apache.crunch.impl.spark.SparkRuntime.access$000(
> SparkRuntime.java:77)
>         at org.apache.crunch.impl.spark.SparkRuntime$2.run(
> SparkRuntime.java:136)
>         at java.lang.Thread.run(Thread.java:745)
>
> This happens both on EMR 5.12.0 using the 0.15.0 artifacts, as well as on
> a non-production CDH cluster running CDH 5.13.1 parcels.
>
> Any idea what would cause this?
>
> Thanks,
>      Dave
>
>
>

Reply via email to