Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12004
  
    If I may, I believe the intent here is to add an extra dependency-only 
module that adds in Hadoop's integration modules for various cloud stores. If 
building with this module enabled, you build in some support for S3, Azure, etc.
    
    And there are docs for the same, and some very basic smoke tests.
    
    I think the use case is: custom build of Spark for stand-alone deployment 
on a cloud provider? because a Hadoop cluster would have these. I think the 
upside is clear: docs are nice, a pre-packaged way to pull in these deps 
correctly is nice.
    
    My outstanding hesitations are:
    
    - Well, the complexity of another module
    - Do people really want to build support for all cloud providers or just 
the one they use? if just one can they bundle with their app? (I have the 
feeling I asked and forgot this was answered)
    - Does it telegraph some commitment to working with, say, S3, that isn't 
really there? that is, I'm not clear that you can really use Spark with S3 
after this anyway or am I not up to date?
    - We may be about to not support Hadoop < 2.6, or < 2.7. Does that change 
the right way to do this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to