Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/14646
  
    I'm adding the ability to test against staged releases, such as Hadoop 
2.7.3 RC1. Add this profile and testing that spark runs with the new RC is a 
matter of setting the version with a -D and ask for staging artifacts -there's 
no need to edit the POMs at all:
    
    ```
    dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging 
-Dhadoop.version=2.7.3
    ```
    
    If all I wanted to do was test with locally built stuff, I wouldn't need 
the profile; just do the `mvn install` in Hadoop then build spark with 
`-Dhadoop.version=2.8.0-SNAPSHOT`; this works perfectly well. What this patch 
adds is the ability to test against the real ASF RC artifacts, so do regression 
testing against them.
    
    I used this as part of the review of the RC; it'll need to be repeated when 
the 2.8.x RCs are out.
    
    ```
    +1 binding
    
    
    1. built and tested apache slider (incubating) against the Hadoop 2.7.3 
artifacts
    
    2. did a build & test of Apache Spark master branch iwth 2.7.3 JARs, 
    
    For that I had to tweak spark's build to support the staging repo; 
hopefully that will get into Spark 
    
    https://issues.apache.org/jira/browse/SPARK-17058
    
    3. did a test run of my WiP SPARK-7481 spark-cloud module; after fixing a 
couple of things on the test setup side related to HADOOP-13058, 
    
        mvn test --pl cloud -Pyarn,hadoop-2.7,snapshots-and-staging 
-Dhadoop.version=2.7.3 -Dcloud.test.configuration.file=../conf/cloud-tests.xml
    
    all was well —albeit measurably slower than Hadoop 2.8. That's proof that 
the 2.8 version of s3a really does deliver measurable speedup for those tests 
(currently just file input/seek; more to come). I had originally thought things 
were broken as s3 init was failing -but that's because the s3 bucket was in 
frankfurt, and the AWS library used can't talk to that endpoint (v4 auth 
protocol, see).
    
    4. did a full spark distribution build of that SPARK-7481 branch
    
        dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging 
-Dhadoop.version=2.7.3
    
    ran command line test to do read of s3a data:
    
        bin/spark-submit --class org.apache.spark.cloud.s3.examples.S3LineCount 
\
                          --conf spark.hadoop.fs.s3a.access.key=$AWS_KEY \
                          --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET \
                          examples/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar
    
    5. Pulled out the microsoft Azure JAR azure-storage-2.0.0.jar and repeated 
step 4
    
    -this showed that the 2.7.x branch does handle the failure to load a 
filesystem due to dependency or other classloading problems —this was proving 
a big problem in adding the aws & azure stuff to the spark build, as it'd stop 
spark from starting up if the dependencies were absent.
    
    I've not done any of the .tar.gz diligence; I've just looked at the staged 
JARs and how they worked with downstream apps —that being a key way that 
Hadoop artifacts are adopted.
    
    ```
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to