Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178057319
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
</exclusion>
</exclusions>
</dependency>
+ <!--
+ the AWS module pulls in jackson; its transitive dependencies can
create
+ intra-jackson-module version problems.
+ -->
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-aws</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>${hadoop.deps.scope}</scope>
+ <exclusions>
+ <exclusion>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-common</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>commons-logging</groupId>
+ <artifactId>commons-logging</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.codehaus.jackson</groupId>
+ <artifactId>jackson-mapper-asl</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.codehaus.jackson</groupId>
+ <artifactId>jackson-core-asl</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-core</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-databind</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-annotations</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-openstack</artifactId>
+ <version>${hadoop.version}</version>
+ <scope>${hadoop.deps.scope}</scope>
+ <exclusions>
+ <exclusion>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-common</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>commons-logging</groupId>
+ <artifactId>commons-logging</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>junit</groupId>
+ <artifactId>junit</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.mockito</groupId>
+ <artifactId>mockito-all</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+
+ <!--
+ Add joda time to ensure that anything downstream which doesn't
pull in spark-hive
+ gets the correct joda time artifact, so doesn't have auth failures
on later Java 8 JVMs
+ -->
+ <dependency>
+ <groupId>joda-time</groupId>
+ <artifactId>joda-time</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <!-- explicitly declare the jackson artifacts desired -->
+ <dependency>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-databind</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-annotations</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <dependency>
+ <groupId>com.fasterxml.jackson.dataformat</groupId>
+ <artifactId>jackson-dataformat-cbor</artifactId>
+ <version>${fasterxml.jackson.version}</version>
+ </dependency>
+ <!--Explicit declaration to force in Spark version into transitive
dependencies -->
+ <dependency>
+ <groupId>org.apache.httpcomponents</groupId>
+ <artifactId>httpclient</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ <!--Explicit declaration to force in Spark version into transitive
dependencies -->
+ <dependency>
+ <groupId>org.apache.httpcomponents</groupId>
+ <artifactId>httpcore</artifactId>
+ <scope>${hadoop.deps.scope}</scope>
+ </dependency>
+ </dependencies>
+ </profile>
+
+ <!--
+ Hadoop 3 simplifies the classpath, and adds a new committer base
class which
+ enables store-specific committers.
+ -->
+ <profile>
+ <id>hadoop-3</id>
+ <properties>
+ <extra.source.dir>src/hadoop-3/main/scala</extra.source.dir>
+
<extra.testsource.dir>src/hadoop-3/test/scala</extra.testsource.dir>
+ </properties>
+
+ <build>
+ <plugins>
+ <!-- Include a source dir depending on the Scala version -->
+ <plugin>
+ <groupId>org.codehaus.mojo</groupId>
+ <artifactId>build-helper-maven-plugin</artifactId>
+ <executions>
+ <execution>
+ <id>add-scala-sources</id>
+ <phase>generate-sources</phase>
+ <goals>
+ <goal>add-source</goal>
+ </goals>
+ <configuration>
+ <sources>
+ <source>${extra.source.dir}</source>
+ </sources>
+ </configuration>
+ </execution>
+ <execution>
+ <id>add-scala-test-sources</id>
+ <phase>generate-test-sources</phase>
+ <goals>
+ <goal>add-test-source</goal>
+ </goals>
+ <configuration>
+ <sources>
+ <source>${extra.testsource.dir}</source>
+ </sources>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+ </plugins>
+
+ </build>
+ <dependencies>
+
+ <!--
+ There's now a hadoop-cloud-storage which transitively pulls in the
store JARs,
+ but it still needs some selective exclusion across versions,
especially 3.0.x.
--- End diff --
Excluding hadoop-client means there's no need to worry about any of the
stuff explicitly excluded from hadoop-client in the spark root pom (asm/asm,
jackson, etc).
Hadoop 3.0.1 declares hadoop-client as a compile time dependency of
[hadoop-cloud-storage](https://github.com/apache/hadoop/blob/branch-3.0.1/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml)
From 3.0.2+ it's been cut down to provided, and added `azure-datalake` as a
dependency [commit
3c03672e](https://github.com/apache/hadoop/commit/3c03672e876ddbd6a6425ea1a056ad13adc309ea),
so it's complete w.r.t ASF connectors.
There's also a fix for the aws shaded SDK to exclude netty
[HADOOP-15264](https://github.com/apache/hadoop/commit/e015e009897e481edc79f4ba72e2c53610b178a3),
because of
[aws-sdk-java/issues/1488](https://github.com/aws/aws-sdk-java/issues/1488).
The individual hadoop cloud modules (hadoop-aws, hadoop-azure, ...) have
also downgraded hadoop-client to being provided, so if you pull in any of
those, you will only get the extra artifacts needed to connect to the relevant
cloud endpoint, and are expected to pull in the same hadoop-client version
elsewhere for things to work.
Here's the dependency list for spark-hadoop-cloud and 3.0.2-SNAPSHOT; 3.1
will be the same unless there's a last minute update to one of the external
SDKs or jetty.
```
[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile
[INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile
[INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
[INFO] | | \- org.jdom:jdom:jar:1.1:compile
[INFO] | +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile
[INFO] | | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile
[INFO] | +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile
[INFO] | | +- com.microsoft.azure:azure-storage:jar:5.4.0:compile
[INFO] | | | \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile
[INFO] | | \-
org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile
[INFO] | +-
org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile
[INFO] | | \-
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile
[INFO] | \- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile
```
Given that Hadoop 3.0.2+ is downgrading hadoop-client to provided, and
that's the minimum version this patch will build against, then the exclusion is
mostly superfluous: there to block regressions than actually keep it out.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]