Re: Missing module spark-hadoop-cloud in Maven central

2021-06-21 Thread Dongjoon Hyun
Hi, Stephen and Steve.

Apache Spark community starts to publish it as a snapshot and Apache Spark 
3.2.0 will be the first release has it.

- 
https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hadoop-cloud_2.12/3.2.0-SNAPSHOT/

Please check the snapshot artifacts and file an Apache Spark JIRA if you hit 
some issues.

Bests,
Dongjoon.

On 2021/06/02 19:05:29, Steve Loughran  wrote: 
> off the record: Really irritates me too, as it forces me to do local builds
> even though I shouldn't have to. Sometimes I do that for other reasons, but
> still.
> 
> Getting the cloud-storage module in was hard enough at the time that I
> wasn't going to push harder; I essentially stopped trying to get one in to
> spark after that and effectively being told to go and play in my own fork
> (*).
> 
> https://github.com/apache/spark/pull/12004#issuecomment-259020494
> 
> Given that effort almost failed, to then say "now include the artifact and
> releases" wasn't something I was going to do; I had everything I needed for
> my own build, and trying to add new PRs struck me as an exercise in
> confrontation and futility
> 
> Sean, if I do submit a PR which makes hadoop-cloud default on the right
> versions, but strips out the dependencies on the final tarball, would that
> get some attention?
> 
> (*) Sean of course, was a notable exception and very supportive.
> 
> 
> 
> 
> 
> 
> 
> On Wed, 2 Jun 2021 at 00:56, Stephen Coy  wrote:
> 
> > I have been building Apache Spark from source just so I can get this
> > dependency.
> >
> >
> >1. git checkout v3.1.1
> >2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn
> >-Phadoop-3.2  -Pyarn -Phadoop-cloud
> >-Phive-thriftserver  -Dhadoop.version=3.2.0
> >
> >
> > It is kind of a nuisance having to do this though.
> >
> > Steve C
> >
> >
> > On 31 May 2021, at 10:34 pm, Sean Owen  wrote:
> >
> > I know it's not enabled by default when the binary artifacts are built,
> > but not exactly sure why it's not built separately at all. It's almost a
> > dependencies-only pom artifact, but there are two source files. Steve do
> > you have an angle on that?
> >
> > On Mon, May 31, 2021 at 5:37 AM Erik Torres  wrote:
> >
> >> Hi,
> >>
> >> I'm following this documentation
> >> 
> >>  to
> >> configure my Spark-based application to interact with Amazon S3. However, I
> >> cannot find the spark-hadoop-cloud module in Maven central for the
> >> non-commercial distribution of Apache Spark. From the documentation I would
> >> expect that I can get this module as a Maven dependency in my project.
> >> However, I ended up building the spark-hadoop-cloud module from the Spark's
> >> code
> >> 
> >> .
> >>
> >> Is this the expected way to setup the integration with Amazon S3? I think
> >> I'm missing something here.
> >>
> >> Thanks in advance!
> >>
> >> Erik
> >>
> >
> > This email contains confidential information of and is the copyright of
> > Infomedia. It must not be forwarded, amended or disclosed without consent
> > of the sender. If you received this message by mistake, please advise the
> > sender and delete all copies. Security of transmission on the internet
> > cannot be guaranteed, could be infected, intercepted, or corrupted and you
> > should ensure you have suitable antivirus protection in place. By sending
> > us your or any third party personal details, you consent to (or confirm you
> > have obtained consent from such third parties) to Infomedia’s privacy
> > policy. http://www.infomedia.com.au/privacy-policy/
> >
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Missing module spark-hadoop-cloud in Maven central

2021-06-02 Thread Steve Loughran
off the record: Really irritates me too, as it forces me to do local builds
even though I shouldn't have to. Sometimes I do that for other reasons, but
still.

Getting the cloud-storage module in was hard enough at the time that I
wasn't going to push harder; I essentially stopped trying to get one in to
spark after that and effectively being told to go and play in my own fork
(*).

https://github.com/apache/spark/pull/12004#issuecomment-259020494

Given that effort almost failed, to then say "now include the artifact and
releases" wasn't something I was going to do; I had everything I needed for
my own build, and trying to add new PRs struck me as an exercise in
confrontation and futility

Sean, if I do submit a PR which makes hadoop-cloud default on the right
versions, but strips out the dependencies on the final tarball, would that
get some attention?

(*) Sean of course, was a notable exception and very supportive.







On Wed, 2 Jun 2021 at 00:56, Stephen Coy  wrote:

> I have been building Apache Spark from source just so I can get this
> dependency.
>
>
>1. git checkout v3.1.1
>2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn
>-Phadoop-3.2  -Pyarn -Phadoop-cloud
>-Phive-thriftserver  -Dhadoop.version=3.2.0
>
>
> It is kind of a nuisance having to do this though.
>
> Steve C
>
>
> On 31 May 2021, at 10:34 pm, Sean Owen  wrote:
>
> I know it's not enabled by default when the binary artifacts are built,
> but not exactly sure why it's not built separately at all. It's almost a
> dependencies-only pom artifact, but there are two source files. Steve do
> you have an angle on that?
>
> On Mon, May 31, 2021 at 5:37 AM Erik Torres  wrote:
>
>> Hi,
>>
>> I'm following this documentation
>> 
>>  to
>> configure my Spark-based application to interact with Amazon S3. However, I
>> cannot find the spark-hadoop-cloud module in Maven central for the
>> non-commercial distribution of Apache Spark. From the documentation I would
>> expect that I can get this module as a Maven dependency in my project.
>> However, I ended up building the spark-hadoop-cloud module from the Spark's
>> code
>> 
>> .
>>
>> Is this the expected way to setup the integration with Amazon S3? I think
>> I'm missing something here.
>>
>> Thanks in advance!
>>
>> Erik
>>
>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>


Re: Missing module spark-hadoop-cloud in Maven central

2021-06-01 Thread Stephen Coy
I have been building Apache Spark from source just so I can get this dependency.


  1.  git checkout v3.1.1
  2.  dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn 
-Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver  -Dhadoop.version=3.2.0

It is kind of a nuisance having to do this though.

Steve C


On 31 May 2021, at 10:34 pm, Sean Owen 
mailto:sro...@gmail.com>> wrote:

I know it's not enabled by default when the binary artifacts are built, but not 
exactly sure why it's not built separately at all. It's almost a 
dependencies-only pom artifact, but there are two source files. Steve do you 
have an angle on that?

On Mon, May 31, 2021 at 5:37 AM Erik Torres 
mailto:etserr...@gmail.com>> wrote:
Hi,

I'm following this 
documentation
 to configure my Spark-based application to interact with Amazon S3. However, I 
cannot find the spark-hadoop-cloud module in Maven central for the 
non-commercial distribution of Apache Spark. From the documentation I would 
expect that I can get this module as a Maven dependency in my project. However, 
I ended up building the spark-hadoop-cloud module from the Spark's 
code.

Is this the expected way to setup the integration with Amazon S3? I think I'm 
missing something here.

Thanks in advance!

Erik

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia's privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: Missing module spark-hadoop-cloud in Maven central

2021-05-31 Thread Sean Owen
I know it's not enabled by default when the binary artifacts are built, but
not exactly sure why it's not built separately at all. It's almost a
dependencies-only pom artifact, but there are two source files. Steve do
you have an angle on that?

On Mon, May 31, 2021 at 5:37 AM Erik Torres  wrote:

> Hi,
>
> I'm following this documentation
>  to
> configure my Spark-based application to interact with Amazon S3. However, I
> cannot find the spark-hadoop-cloud module in Maven central for the
> non-commercial distribution of Apache Spark. From the documentation I would
> expect that I can get this module as a Maven dependency in my project.
> However, I ended up building the spark-hadoop-cloud module from the Spark's
> code .
>
> Is this the expected way to setup the integration with Amazon S3? I think
> I'm missing something here.
>
> Thanks in advance!
>
> Erik
>