GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/20923
[SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups,
cloud-storage artifacts and binding
## What changes were proposed in this pull request?
1. Adds a `hadoop-3` profile build depending on the hadoop-3.1 artifacts.
It's tagged as WiP because Hadoop-3.1 isn't out the door yet; it's depending on
the hadoop 3.1-SNAPSHOT.
1. In the hadoop-cloud module, adds an explicit hadoop-3 profile which
switches from explicitly pulling in cloud connectors (hadoop-openstack,
hadoop-aws, hadoop-azure) to depending on the hadoop-cloudstorage POM artifact,
which pulls these in, has pre-excluded things like hadoop-common, and stays up
to date with new connectors (hadoop-azuredatalake, hadoop-allyun). Goal: it
becomes the Hadoop projects homework of keeping this clean, and the spark
project doesn't need to handle new hadoop releases adding more dependencies.
and lines up spark for switching to a shaded hadoop-cloud-storage bundle
when implemented.
1. In the hadoop-cloud module, adds new source and tests for connecting to
the `PathOutputCommitter` factory mechanism of Hadoop 3.1.
1. Increases the curator and zookeeper versions to match those in hadoop-3,
fixing spark core to build in sbt with the hadoop-3 dependencies.
Why 3.1-SNAPSHOT over 3.0.1?
* 3.0.0 has to be viewed as an early relase of the code; 3.1 should be the
stable one.
* The committer changes are only in the forthcoming 3.1.0 and 3.0.2
releases.
* The cloud-storage dependencies are still unstable in the 3.0.x line (too
many transitive dependencies, omitted hadoop-allyun). The hadoop-3 profile does
exclude the transitive cruft, for anyone who does want to use branch-3.0 builds.
Hadoop 3.1 should be viewed as the version where Hadoop 3.x is really ready
to play.
## How was this patch tested?
* There's some minimal unit tests of the new source in the hadoop-cloud
module when built with the hadoop-3 connector;
* Everything this has been built and tested against both ASF Hadoop
branch-3.1 and hadoop trunk.
The spark hive JAR has problems here, as it's version check logic fails for
Hadoop versions > 2.
This can be avoided with either of
* The hadoop JARs built to declare their version as Hadoop 2.11 `mvn
install -DskipTests -DskipShade -Ddeclared.hadoop.version=2.11` . This is safe
for local test runs, not for deployment (HDFS is very strict about
cross-version deployment).
* A modified version of spark hive whose version check switch statement is
happy with hadoop 3.
I've done both, with maven and SBT.
Two issues surfaced
1. A spark-core test failure âfixed in SPARK-23787.
1. SBT only: Zookeeper not being found in spark-core. Somehow curator
2.12.0 triggers some slightly different dependency resolution logic from
previous versions, and Ivy was missing zookeeper.jar entirely. This patch adds
the explicit declaration for all spark profiles, setting the ZK version = 3.4.9
for hadoop-3
The integration tests against real infrastructures live [on
github](https://github.com/hortonworks-spark/cloud-integration/tree/master/cloud-examples).
These verify that s3, azure wasb, azure-datalake and openstack swift stores
can be used as the source and destination of work.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/steveloughran/spark
cloud/SPARK-23807-hadoop-31
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20923.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20923
----
commit 29e73242cba9797ed24127b24bb0380c69a608d3
Author: Steve Loughran <stevel@...>
Date: 2018-03-28T17:38:57Z
SPARK-23807 Add Hadoop 3 profile with relevant POM fix ups, cloud-storage
artifacts and binding
Change-Id: Ia4526f184ced9eef5b67aee9e91eced0dd38d723
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]