Some people may have noticed I've been working on adding packaging, docs & testing for getting Spark to work with S3, Azure and openstack into a Spark distribution,
https://github.com/apache/spark/pull/12004 It's been a WiP, but now I've got tests for all three cloud infrastructures, tests covering: basic IO, output committing, dataframe IO and streaming, the core test coverage is done; the packaging working. Which means I'd really like some reviews by people who want to have spark work with S3, Azure or their local Swift endpoint to review that PR, ideally going through the documentation and validating that as well as the code. It's Hadoop 2.7+ only, with a new profile, "cloud", to pull in the new module of the same name. thanks -Steve PS: documentation (without templated code rendering): https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/docs/cloud-integration.md