[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214576#comment-15214576
 ] 

Steve Loughran commented on SPARK-7481:
---

I've created a pull request on this, which has

# a new module (tentative name, `spark-cloud`) which has transitive 
dependencies on the hadoop and amazon/microsoft JARs
# a dependency in spark assembly on the module and the hadoop JARs, *excluding 
those amazon/microsoft JARs*.

This re-instances s3n, adds s3a, swift and (hadoop 2.7+) wasb support. For s3a 
and wasb, you will need to add the external JAR during job submission. 

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214574#comment-15214574
 ] 

Apache Spark commented on SPARK-7481:
-

User 'steveloughran' has created a pull request for this issue:
https://github.com/apache/spark/pull/12004

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-19 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197451#comment-15197451
 ] 

Nicholas Chammas commented on SPARK-7481:
-

(Sorry Steve; can't comment on your proposal since I don't know much about 
these kinds of build decisions.)

Just to add some more evidence to the record that this problem appears to 
affect many people, take a look at this: 
http://stackoverflow.com/search?q=%5Bapache-spark%5D+S3+Hadoop+2.6

Lots of confusion about how to access S3, with the recommended solution as 
before being to [use Spark built against Hadoop 
2.4|http://stackoverflow.com/a/30852341/877069].

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197165#comment-15197165
 ] 

Steve Loughran commented on SPARK-7481:
---

...thinking some more about this

How about 

# adding a {{spark-cloud}} module which, initially, does nothing but declare 
the dependencies on {{hadoop-aws}}, {{hadoop-openstack}}, and on 2.7+, 
{{hadoop-azure}}. 
# have spark assembly declare a dependency on this module, but explicitly 
excluding all dependencies other than the hadoop ones (i.e. no amazon libs, no 
extra httpclient ones for openstack (if there are any), anything azure wants). 
If someone wants to add the relevant amazon libs, they need to explicitly add 
it on the {{--jars}} option.

Doing it this way means that if a project depends on {{spark-cloud}} it gets 
all the cloud dependencies that version of spark+hadoop needs.

It also provides a placeholder for explicit cloud support, specifically

- output committers that don't try to rename/assume that directory delete is 
atomic and O(1)
- some optional tests/examples to read/write data. 

The tests would be good not just for spark, but for catching regressions in 
hadoop/aws/azure code.

If people think this is good, assign it to me and I'll look at it in april

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177922#comment-15177922
 ] 

Steve Loughran commented on SPARK-7481:
---

For comparison, the full AWS SDK is 13MB; the s3 SDK 570K, so something that 
could possibly be added. But adding it does set up an implicit commitment to 
keep it there, would lead to discussion about why not azure, google gfs,  
Saying "add the aws-s3-sdk JAR if you want it" avoids making any such 
commitment. 

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177918#comment-15177918
 ] 

Steve Loughran commented on SPARK-7481:
---

Longer term, having spark_home /lib/*.jar is the best, general purpose solution.

For now,

{{hadoop-aws}} can be added to the existing 2.6 profile, explicitly excluding 
the full amazon SDK jar. This would give s3n back to the code. Jets3t is still 
in the spark-assembly JAR today.

If built with 2.6.x, you'd get s3n and, if you added the full aws-SDK JAR with 
--addjars, S3a support

If you built with 2.7.x (e.g {{-Dhadoop=version=2.7.2}}) you'd get, s3n, s3a 
and, implicitly, the (much smaller) {{amazon-s3-sdk}} JAR needed to talk with 
S3. Users wouldn't need to add the amazon-aws-sdk.jar to the submission (it 
would cause link problems if they tried).

..Or, to keep the assembly JAR small, {{amazon-s3-sdk}} could also be excluded. 
This would add the ASF classes, but you'd always need to add the right JAR for 
the hadoop version you compiled against (Amazon changed a parameter from an int 
to a long in a method, see)

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177623#comment-15177623
 ] 

Steve Loughran commented on SPARK-7481:
---

Hadoop 2.6 added S3a, which we put into a new hadoop-tools/hadoop-aws JAR, 
along with a dependency on sun's {{aws-java}} SDK. Someone other than myself 
went and moved the existing S3n classes into the same JAR. If'd I'd seen that, 
i'd have -1'd it, but I didn't notice until 2.6 shipped.

as stated, I wouldn't use S3a in Hadoop 2.6.x. HADOOP-11571 contains the 
reasons. It wasn't  until Hadoop 2.7 that it became ready for serious use.

Both come in hadoop-aws; s3a needs an amazon JAR, which must be matched 
precisely with the version used in the hadoop library.



> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-02 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176559#comment-15176559
 ] 

Nicholas Chammas commented on SPARK-7481:
-

I'm not comfortable working with Maven so I can't comment on the details of the 
approach we should take, but I will appreciate any progress towards making 
Spark built against Hadoop 2.6+ work with S3 out of the box, or as close to out 
of the box as possible.

Given Spark's close relation to S3 and EC2 (as far as Spark's user base is 
concerned), a good out of the box experience here is critical. Many people just 
expect it.

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-02 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176551#comment-15176551
 ] 

Nicholas Chammas commented on SPARK-7481:
-

{quote}
One issue here that hadoop 2.6's hadoop-aws pulls in the whole AWT toolkit, 
which is pretty weighty, for s3a ... which isn't something I'd use in 2.6 
anyway.
{quote}

Did you mean something other than s3a here?

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175664#comment-15175664
 ] 

Steve Loughran commented on SPARK-7481:
---

One issue here that hadoop 2.6's hadoop-aws pulls in the whole AWT toolkit, 
which is pretty weighty, for s3a ... which isn't something I'd use in 2.6 
anyway.

Hadoop 2.7 moved to the (link-time-incompatible) amazon-s3 JAR, also adds 
hadoop-azure with some wasb JAR. And in Hadoop 2.7 onwards,. s3a is the one i 
would run to use in preference to s3n. 

What might work is a hadoop 2.6 profile which explicity adds hadoop-aws, then 
excludes the amazon sdk
{code}

  com.amazonaws
  aws-java-sdk
  compile

{code}

This would automatically pick up the {{aws-java-sdk-s3}} JAR on a 2.7+ build, 
because it's not excluded by name. Though then there's fun if you try to add 
the {{aws-java-sdk-s3}} JAR needed for Hadoop 2.6 to the classpath, as it won't 
link. Which makes me think that excluding  {{aws-java-sdk-s3}} would be safer. 
The hadoop code to talk to s3a and s3n would be there, s3n would work as 
well/badly as it always does, and for s3a you'd need to add the right aws JAR 
for your hadoop version

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-01 Thread Peng Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174442#comment-15174442
 ] 

Peng Cheng commented on SPARK-7481:
---

+1 Me four

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-01 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174438#comment-15174438
 ] 

Nicholas Chammas commented on SPARK-7481:
-

Many people seem to be downgrading to use Spark built against Hadoop 2.4 
because the Spark / Hadoop 2.6 package doesn't work against S3 out of the box.

* [Example 
1|https://issues.apache.org/jira/browse/SPARK-7442?focusedCommentId=14582965=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14582965]
* [Example 
2|https://issues.apache.org/jira/browse/SPARK-7442?focusedCommentId=14903750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14903750]
* [Example 
3|https://github.com/nchammas/flintrock/issues/88#issuecomment-190905262]

If this proposal eliminates that bit of friction for users without being too 
burdensome on the team, then I'm for it.

Ideally, we want people using Spark built against the latest version of Hadoop 
anyway, right? This proposal would nudge people in that direction.

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-02-21 Thread Yardena (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156101#comment-15156101
 ] 

Yardena commented on SPARK-7481:


+1, please add this.
lib/* approach would be great, or a profile like initially suggested (which is 
what we do manually right now). 
Thanks.

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-02-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126175#comment-15126175
 ] 

Steve Loughran commented on SPARK-7481:
---

having a lib/* would be fantastic, as it stops spark having to worry about the 
details, or explain to users what they have to do.

whoever wants to use WASB. google fs or s3a would have to put in the relevant 
JARs, both hadoop ones and third party, but they could either do that 
themselves or spark/bigtop could add the profile

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-01-31 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125566#comment-15125566
 ] 

Marcelo Vanzin commented on SPARK-7481:
---

It doesn't necessarily affect this proposal. It would make it easier to have 
people add these separately - just drop the jars in Spark's "lib" directory and 
suddenly they're part of Spark.

But if you don't add the dependency explicitly in Spark's build, they'll not be 
included in Spark's packaging, so there would still be a manual step to add 
support for those backends.

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125143#comment-15125143
 ] 

Josh Rosen commented on SPARK-7481:
---

How does this proposal change if we just remove the assembly and ship a folder 
of JARs, as has been proposed elsewhere by [~vanzin]? Does that render this 
proposal moot?

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-01-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085477#comment-15085477
 ] 

Steve Loughran commented on SPARK-7481:
---

Josh,

there is a 2.6 profile —but all it currently does is bump up the dependencies 
of other things (jets3t, curator, etc). It doesn't pull in hadoop-aws, which is 
where the s3a, s3n stuff lives, or the amazon JAR which is needed for s3a to 
work (the fact that s3n moved to the new JAR was something somebody else did; 
I've have probably vetoed it if I'd noticed). 

the amazon JAR in Hadoop 2.6, `aws-java-sdk` is huge, and not something you'd 
want in the spark assembly. Hadoop 2.7+ has switched to to the leaner 
aws-java-sdk-s3; HADOOP-12269 has shown how that's been a bit brittle over 
versions.

Pulling in all the amazon SDK bits into the assembly jar is something that 
could be done if targeting Hadoop 2.7+, but you'd need care to make sure that 
the exact amazon lib that Hadoop was built against is used.

It'd be easier if
# `bin\spark-class` (and transitively, things like the yarn launcher) grabbed 
*.jar from the Spark lib dir, so all people would need to do is drop in the 
appropriate aws JAR (or for azure, the MSFT azure JAR)
# the 2.6 profile added hadoop-aws to the dependencies of the spark assembly 
(and hadoop-openstack)
# a 2.7 profile added hadoop-azure

that is: the hadoop code is used (all fairly thin), but the third party JARs 
are left out

This would mean the assembly had all the Hadoop stuff, and all people needed to 
do was drop in the external jirs to the lib directory

What do you think?

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-01-05 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083646#comment-15083646
 ] 

Josh Rosen commented on SPARK-7481:
---

Hey, is this task done? I see that we have a {{hadoop2.6}} profile now.

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-08-19 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703025#comment-14703025
 ] 

Thomas Demoor commented on SPARK-7481:
--

[~srowen] and [~pc...@uowmail.edu.au]:  with HADOOP-12269 merged in s3a only 
needs aws-sdk-core, aws-sdk-kms and aws-sdk-s3, with combined size of ~1.4MB 
(down from 11.5MB), updated dependencies, no Kinesis, ...

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-08-19 Thread Peng Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703585#comment-14703585
 ] 

Peng Cheng commented on SPARK-7481:
---

Thanks a lot! A long run to the end.

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-07-30 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647406#comment-14647406
 ] 

Thomas Demoor commented on SPARK-7481:
--

Pulled the aws-upgrade out of HADOOP-11684 to a separate issue HADOOP-12269. 
Only uses aws-sdk-s3-1.10.6 instead of the entire sdk.

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-28 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563033#comment-14563033
 ] 

Thomas Demoor commented on SPARK-7481:
--

[~srowen] and [~ste...@apache.org], for s3a,  things should improve in future 
Hadoop versions. I have a first patch set up for [HADOOP-11684] that also bumps 
the aws-sdk version to a recent version. From 1.9 onwards, you can pull in 
individual components separately. For s3a, we only need s3 (and evidently the 
core lib) which solves both the large size and the kinesis issue.

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-23 Thread Peng Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557549#comment-14557549
 ] 

Peng Cheng commented on SPARK-7481:
---

I've tried to do it but I get a lot of headaches, as aws toolkit is using an 
outdate jackson library.
Though this feature is indeed blocking me from upgrading to hadoop 2.6. So I 
guess its important

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534168#comment-14534168
 ] 

Sean Owen commented on SPARK-7481:
--

Yikes, that seems like a load of stuff to pull in. Can't this / shouldn't this 
be added by the end user if desired?

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534174#comment-14534174
 ] 

Steve Loughran commented on SPARK-7481:
---

This doesn't contain any endorsement of the use of s3a in Hadoop 2.6; see 
HADOOP-11571

I'm not planning to add any tests for this, but its something to consider for 
regression testing all the object stores —the tests just need to:
* be skipped if there's no credentials
* make a best effort to stop anyone accidentally checking in their credentials
* work on deskop/jenkins rather than just on cloud.
* not run up massive bills
* not take forever

AWS publishes some free-to-read datasets, such as [this 
one|http://datasets.elasticmapreduce.s3.amazonaws.com/] which won't need 
credentials, work remote and don't ring up bills for the read part of the 
process, but would take a long time to complete on a single executor. 

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534210#comment-14534210
 ] 

Sean Owen commented on SPARK-7481:
--

Maybe I'd be less frightened if I knew the size of these deps and their 
dependencies was small, and the licenses were all OK, etc. This would need some 
checking; I know we had a license problem and so forth with Kinesis, and have 
had jets3t problems, etc. I am maybe needlessly wary of doing this several 
times over to add more niche FS clients to the main build for everyone.

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2015-05-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534622#comment-14534622
 ] 

Steve Loughran commented on SPARK-7481:
---

hadoop openstack 100K +httpclient (400K)
hadoop-aws : 85K, jetset 500K
s3a needs the aws toolkit @ 11.5MB, so it's the big one
azure is 500K.

to retain s3n in spark, the hadoop-aws and jetset dependency needs to go in; 
s3a is a fairly large additions

 Add Hadoop 2.6+ profile to pull in object store FS accessors
 

 Key: SPARK-7481
 URL: https://issues.apache.org/jira/browse/SPARK-7481
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.3.1
Reporter: Steve Loughran

 To keep the s3n classpath right, to add s3a, swift  azure, the dependencies 
 of spark in a 2.6+ profile need to add the relevant object store packages 
 (hadoop-aws, hadoop-openstack, hadoop-azure)
 this adds more stuff to the client bundle, but will mean a single spark 
 package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org