[jira] [Resolved] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2015-02-08 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3039.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4315
[https://github.com/apache/spark/pull/4315]

 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
 1 API
 --

 Key: SPARK-3039
 URL: https://issues.apache.org/jira/browse/SPARK-3039
 Project: Spark
  Issue Type: Bug
  Components: Build, Input/Output, Spark Core
Affects Versions: 0.9.1, 1.0.0, 1.1.0, 1.2.0
 Environment: hadoop2, hadoop-2.4.0, HDP-2.1
Reporter: Bertrand Bossy
Assignee: Bertrand Bossy
Priority: Critical
 Fix For: 1.3.0


 The spark assembly contains the artifact org.apache.avro:avro-mapred as a 
 dependency of org.spark-project.hive:hive-serde.
 The avro-mapred package provides a hadoop FileInputFormat to read and write 
 avro files. There are two versions of this package, distinguished by a 
 classifier. avro-mapred for the new Hadoop API uses the classifier hadoop2. 
 avro-mapred for the old Hadoop API uses no classifier.
 E.g. when reading avro files using 
 {code}
 sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]](hdfs://path/to/file.avro)
 {code}
 The following error occurs:
 {code}
 java.lang.IncompatibleClassChangeError: Found interface 
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This error usually is a hint that there was a mix up of the old and the new 
 Hadoop API. As a work-around, if avro-mapred for hadoop2 is forced to 
 appear before the version that is bundled with Spark, reading avro files 
 works fine. 
 Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3039.

Resolution: Fixed

This fix did appear in Spark 1.2.0 so I'm closing this issue.

 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
 1 API
 --

 Key: SPARK-3039
 URL: https://issues.apache.org/jira/browse/SPARK-3039
 Project: Spark
  Issue Type: Bug
  Components: Build, Input/Output, Spark Core
Affects Versions: 0.9.1, 1.0.0, 1.1.0
 Environment: hadoop2, hadoop-2.4.0, HDP-2.1
Reporter: Bertrand Bossy
Assignee: Bertrand Bossy
 Fix For: 1.2.0


 The spark assembly contains the artifact org.apache.avro:avro-mapred as a 
 dependency of org.spark-project.hive:hive-serde.
 The avro-mapred package provides a hadoop FileInputFormat to read and write 
 avro files. There are two versions of this package, distinguished by a 
 classifier. avro-mapred for the new Hadoop API uses the classifier hadoop2. 
 avro-mapred for the old Hadoop API uses no classifier.
 E.g. when reading avro files using 
 {code}
 sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]](hdfs://path/to/file.avro)
 {code}
 The following error occurs:
 {code}
 java.lang.IncompatibleClassChangeError: Found interface 
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This error usually is a hint that there was a mix up of the old and the new 
 Hadoop API. As a work-around, if avro-mapred for hadoop2 is forced to 
 appear before the version that is bundled with Spark, reading avro files 
 works fine. 
 Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-09-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3039.

  Resolution: Fixed
   Fix Version/s: 1.2.0
  1.1.1
Target Version/s: 1.1.1, 1.2.0

Resolved by:
https://github.com/apache/spark/pull/1945

 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
 1 API
 --

 Key: SPARK-3039
 URL: https://issues.apache.org/jira/browse/SPARK-3039
 Project: Spark
  Issue Type: Bug
  Components: Build, Input/Output, Spark Core
Affects Versions: 0.9.1, 1.0.0, 1.1.0
 Environment: hadoop2, hadoop-2.4.0, HDP-2.1
Reporter: Bertrand Bossy
Assignee: Bertrand Bossy
 Fix For: 1.1.1, 1.2.0


 The spark assembly contains the artifact org.apache.avro:avro-mapred as a 
 dependency of org.spark-project.hive:hive-serde.
 The avro-mapred package provides a hadoop FileInputFormat to read and write 
 avro files. There are two versions of this package, distinguished by a 
 classifier. avro-mapred for the new Hadoop API uses the classifier hadoop2. 
 avro-mapred for the old Hadoop API uses no classifier.
 E.g. when reading avro files using 
 {code}
 sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]](hdfs://path/to/file.avro)
 {code}
 The following error occurs:
 {code}
 java.lang.IncompatibleClassChangeError: Found interface 
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This error usually is a hint that there was a mix up of the old and the new 
 Hadoop API. As a work-around, if avro-mapred for hadoop2 is forced to 
 appear before the version that is bundled with Spark, reading avro files 
 works fine. 
 Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org