Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
If I may, I believe the intent here is to add an extra dependency-only
module that adds in Hadoop's integration modules for various cloud stores. If
building with this module enabled, you build in some support for S3, Azure, etc.
And there are docs for the same, and some very basic smoke tests.
I think the use case is: custom build of Spark for stand-alone deployment
on a cloud provider? because a Hadoop cluster would have these. I think the
upside is clear: docs are nice, a pre-packaged way to pull in these deps
correctly is nice.
My outstanding hesitations are:
- Well, the complexity of another module
- Do people really want to build support for all cloud providers or just
the one they use? if just one can they bundle with their app? (I have the
feeling I asked and forgot this was answered)
- Does it telegraph some commitment to working with, say, S3, that isn't
really there? that is, I'm not clear that you can really use Spark with S3
after this anyway or am I not up to date?
- We may be about to not support Hadoop < 2.6, or < 2.7. Does that change
the right way to do this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]