Github user medale commented on the pull request: https://github.com/apache/spark/pull/4315#issuecomment-72785613 The problem was that the Spark project hive-exec 0.13.1a depends on ``` <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <version>${avro.version}</version> </dependency> ``` (see http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom) Its parent defines avro.version as 1.7.5 <avro.version>1.7.5</avro.version> (see http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom) The only place hive-exec is being used as a dependency is in: find . -name pom.xml | xargs grep hive-exec pom.xml (where we define it in dependencyManagement section) sql/hive/pom.xml (in actual dependencies) In sql/hive/pom.xml we also explicitly have dependency on: ``` <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <classifier>${avro.mapred.classifier}</classifier> </dependency> ``` Therefore if we choose a profile that does not define avro.mapred.classifier this field is left empty (see main pom.xml <avro.mapred.classifier></avro.mapred.classifier>). We pull: avro-mapred-1.7.6.jar (exact same as avro-mapred-1.7.6-hadoop1.jar) as it should be. If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull: avro-mapred-1.7.6-hadoop2.jar as it should be. ``` <profile> <id>hadoop-2.4</id> <properties> <hadoop.version>2.4.0</hadoop.version> <protobuf.version>2.5.0</protobuf.version> <jets3t.version>0.9.0</jets3t.version> <hbase.version>0.98.7-hadoop2</hbase.version> <commons.math3.version>3.1.1</commons.math3.version> <avro.mapred.classifier>hadoop2</avro.mapred.classifier> </properties> </profile> ``` However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: ``` <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <version>${avro.version}</version> <classifier>${avro.mapred.classifier}</classifier> <scope>${hive.deps.scope}</scope> ``` That scope is in main pom.xml: <hive.deps.scope>compile</hive.deps.scope> However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: ``` <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <version>${avro.version}</version> <classifier>${avro.mapred.classifier}</classifier> <scope>${hive.deps.scope}</scope> ``` That scope is in main pom.xml: <hive.deps.scope>compile</hive.deps.scope> assembly/pom.xml: <hive.deps.scope>provided</hive.deps.scope> examples/pom.xml: <hive.deps.scope>provided</hive.deps.scope> Same for hive-exec. So competing avro-mapred classes will no longer be included in the spark-assembly.jar. They are not included on the Hadoop classpath (only Avro), so they need to be supplied by the job. That will be new for Avro users. But excluding the hive-exec dependency and explicitly specifying avro-mapred to be only 1.7.6 with the correct classifier will be necessary if anything like maven enforcer is ever run.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org