Some people may have noticed I've been working on adding packaging, docs & 
testing for getting Spark to work with S3, Azure and openstack into a Spark 
distribution,

https://github.com/apache/spark/pull/12004

It's been a WiP, but now I've got tests for all three cloud infrastructures, 
tests covering: basic IO, output committing, dataframe IO and streaming, the 
core test coverage is done; the packaging working.

Which means I'd really like some reviews by people who want to have spark work 
with S3, Azure or their local Swift endpoint to review that PR, ideally going 
through the documentation and validating that as well as the code.
It's Hadoop 2.7+ only, with a new profile, "cloud", to pull in the new module 
of the same name.

thanks

-Steve

PS: documentation (without templated code rendering):
https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/docs/cloud-integration.md


Reply via email to