[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

2019-02-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760085#comment-16760085
 ] 

Steve Loughran commented on HADOOP-15387:
-

well hadoop-client-api could perhaps be used, but it'd mean that that jar 
explicitly included things which are @Private types for internal use by various 
hadoop cloud modules. Doable? 

> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -
>
> Key: HADOOP-15387
> URL: https://issues.apache.org/jira/browse/HADOOP-15387
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
>  * Hadoop dependency choices don't control their decisions
>  * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle 
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be 
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on 
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are 
> happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

2019-01-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756273#comment-16756273
 ] 

Sean Busbey commented on HADOOP-15387:
--

let me think on this a bit. I think the hadoop-common thing is fixable without 
too much heartburn.

the goal is a downstreamer adds {{org.apache.hadoop:hadoop-cloud-storage}} as a 
dependency and things work right? (again ignoring some specifics around "we 
need these logging frameworks" etc)

is "things work" just for "I'm using FileSystem APIs to access the cloud 
storage system X"? or is it some other subset of downstream facing APIs? I know 
your original description said this should pull in the hadoop-client stuff, but 
would it be too much to instead say use of {{hadoop-cloud-storage}} always 
required {{hadoop-client-api}} and {{hadoop-client-runtime}}? Specifically as 
transitive dependencies, not like we'd make folks always add 3 entries to 
maven? (though I suspect most practical uses will require listing one of those 
directly if folks use {{dependency:analyze}})

would the individual SDKs being optional be too onerous? essentially it would 
mean everyone would add {{hadoop-cloud-storage}} and they'd add the SDK(s) for 
whichever of the implementation they were going to actually use. Or is the 
common use case here the opposite? like most downstream users will want to 
opt-out via maven exclusions rather than needing to opt-in? opt-in would mean 
we could make it so only folks who specifically want to work with a provider 
who's SDK leaks dependencies would be impacted by that leakage (and we'd keep 
fixing it the problem of the SDK provider and not us).

> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -
>
> Key: HADOOP-15387
> URL: https://issues.apache.org/jira/browse/HADOOP-15387
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
>  * Hadoop dependency choices don't control their decisions
>  * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle 
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be 
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on 
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are 
> happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

2019-01-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756033#comment-16756033
 ] 

Steve Loughran commented on HADOOP-15387:
-

I actually want to keep out the AWS SDK as it is already shaded. Google GCS has 
a shaded artifact too.

What I'm trying to avoid is having all those hadoop-common dependencies surface 
as requirements, and things needed by hadoop-azure-datalake, hadoop-azure, etc. 

Now, HADOOP-16080 highlights a variant problem: we use (and continue to use) 
things in hadoop-common which aren't in hadoop-client API. That's an 
interesting complication. As far as the object stores are concerned, 
hadoop-common is where we put the common classes, because that's how they are 
shared across otherwise isolated implementations. Not sure what to do there.

> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -
>
> Key: HADOOP-15387
> URL: https://issues.apache.org/jira/browse/HADOOP-15387
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
>  * Hadoop dependency choices don't control their decisions
>  * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle 
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be 
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on 
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are 
> happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

2019-01-29 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755284#comment-16755284
 ] 

Sean Busbey commented on HADOOP-15387:
--

[~ste...@apache.org] can you help me on understanding scope here a bit?

I think what the description says is we end up with a single jar where all 
classes are in {{org.apache.hadoop}} or {{software.amazon.awssdk}} and we rely 
on shading to relocate any others (modulo the normal caveats on logging / 
tracing libraries that came up during the hadoop-client modules).

Does it need to be all of the Amazon AWS SDK? Is there some interface jar that 
we could use while allowing BYO-SDK? Or for that matter could we just update 
the various cloud storage modules to individually relocate things that aren't 
either hadoop-client-facing or their respective service's SDK?

> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -
>
> Key: HADOOP-15387
> URL: https://issues.apache.org/jira/browse/HADOOP-15387
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
>  * Hadoop dependency choices don't control their decisions
>  * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle 
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be 
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on 
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are 
> happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org