Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14646
I'm adding the ability to test against staged releases, such as Hadoop
2.7.3 RC1. Add this profile and testing that spark runs with the new RC is a
matter of setting the version with a -D and ask for staging artifacts -there's
no need to edit the POMs at all:
```
dev/make-distribution.sh -Pyarn,hadoop-2.7,snapshots-and-staging
-Dhadoop.version=2.7.3
```
If all I wanted to do was test with locally built stuff, I wouldn't need
the profile; just do the `mvn install` in Hadoop then build spark with
`-Dhadoop.version=2.8.0-SNAPSHOT`; this works perfectly well. What this patch
adds is the ability to test against the real ASF RC artifacts, so do regression
testing against them.
I used this as part of the review of the RC; it'll need to be repeated when
the 2.8.x RCs are out.
```
+1 binding
1. built and tested apache slider (incubating) against the Hadoop 2.7.3
artifacts
2. did a build & test of Apache Spark master branch iwth 2.7.3 JARs,
For that I had to tweak spark's build to support the staging repo;
hopefully that will get into Spark
https://issues.apache.org/jira/browse/SPARK-17058
3. did a test run of my WiP SPARK-7481 spark-cloud module; after fixing a
couple of things on the test setup side related to HADOOP-13058,
mvn test --pl cloud -Pyarn,hadoop-2.7,snapshots-and-staging
-Dhadoop.version=2.7.3 -Dcloud.test.configuration.file=../conf/cloud-tests.xml
all was well âalbeit measurably slower than Hadoop 2.8. That's proof that
the 2.8 version of s3a really does deliver measurable speedup for those tests
(currently just file input/seek; more to come). I had originally thought things
were broken as s3 init was failing -but that's because the s3 bucket was in
frankfurt, and the AWS library used can't talk to that endpoint (v4 auth
protocol, see).
4. did a full spark distribution build of that SPARK-7481 branch
dev/make-distribution.sh -Pyarn,hadoop-2.7,snapshots-and-staging
-Dhadoop.version=2.7.3
ran command line test to do read of s3a data:
bin/spark-submit --class org.apache.spark.cloud.s3.examples.S3LineCount
\
--conf spark.hadoop.fs.s3a.access.key=$AWS_KEY \
--conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET \
examples/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar
5. Pulled out the microsoft Azure JAR azure-storage-2.0.0.jar and repeated
step 4
-this showed that the 2.7.x branch does handle the failure to load a
filesystem due to dependency or other classloading problems âthis was proving
a big problem in adding the aws & azure stuff to the spark build, as it'd stop
spark from starting up if the dependencies were absent.
I've not done any of the .tar.gz diligence; I've just looked at the staged
JARs and how they worked with downstream apps âthat being a key way that
Hadoop artifacts are adopted.
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]