[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

medale Tue, 03 Feb 2015 19:45:05 -0800

Github user medale commented on the pull request:

    https://github.com/apache/spark/pull/4315#issuecomment-72785613
  
    The problem was that the Spark project hive-exec 0.13.1a depends on
    
    ```
    <dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-mapred</artifactId>
    <version>${avro.version}</version>
    </dependency>
    ```
    
    (see 
http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom)
    
    Its parent defines avro.version as 1.7.5
    
        <avro.version>1.7.5</avro.version>
    
    (see 
http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom)
    
    The only place hive-exec is being used as a dependency is in:
    
        find . -name pom.xml | xargs grep hive-exec
        pom.xml (where we define it in dependencyManagement section)
        sql/hive/pom.xml (in actual dependencies)
        
    In sql/hive/pom.xml we also explicitly have dependency on:
    
    ```
       <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro-mapred</artifactId>
          <classifier>${avro.mapred.classifier}</classifier>
        </dependency>
    ```
    
    Therefore if we choose a profile that does not define avro.mapred.classifier
    this field is left empty (see main pom.xml 
<avro.mapred.classifier></avro.mapred.classifier>).
    We pull: avro-mapred-1.7.6.jar (exact same as 
avro-mapred-1.7.6-hadoop1.jar) as it should be.
    
    If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull:
    avro-mapred-1.7.6-hadoop2.jar as it should be.
    
    ```
        <profile>
          <id>hadoop-2.4</id>
          <properties>
            <hadoop.version>2.4.0</hadoop.version>
            <protobuf.version>2.5.0</protobuf.version>
            <jets3t.version>0.9.0</jets3t.version>
            <hbase.version>0.98.7-hadoop2</hbase.version>
            <commons.math3.version>3.1.1</commons.math3.version>
            <avro.mapred.classifier>hadoop2</avro.mapred.classifier>
          </properties>
        </profile>
    ```
    
    However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly 
defined as:
    
    ```
         <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-mapred</artifactId>
            <version>${avro.version}</version>
            <classifier>${avro.mapred.classifier}</classifier>
            <scope>${hive.deps.scope}</scope>
    ```
    
    That scope is in main pom.xml:
    <hive.deps.scope>compile</hive.deps.scope> 
    
    However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly 
defined as:
    
    ```
         <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-mapred</artifactId>
            <version>${avro.version}</version>
            <classifier>${avro.mapred.classifier}</classifier>
            <scope>${hive.deps.scope}</scope>
    ```
    
    That scope is in main pom.xml:
    <hive.deps.scope>compile</hive.deps.scope> 
    assembly/pom.xml:        <hive.deps.scope>provided</hive.deps.scope>
    examples/pom.xml:        <hive.deps.scope>provided</hive.deps.scope>
    
    Same for hive-exec. So competing avro-mapred classes will no longer be 
included in the spark-assembly.jar. They are not included on the Hadoop 
classpath (only Avro), so they need to be supplied by the job. That will be new 
for Avro users. But excluding the hive-exec dependency and explicitly 
specifying avro-mapred to be only 1.7.6 with the correct classifier will be 
necessary if anything like maven enforcer is ever run.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

Reply via email to