Re: Revisiting Online serving of Spark models?

2018-06-02 Thread Maximiliano Felice
Hi!

We're already in San Francisco waiting for the summit. We even think that
we spotted @holdenk this afternoon.

@chris, we're really interested in the Meetup you're hosting. My team will
probably join it since the beginning of you have room for us, and I'll join
it later after discussing the topics on this thread. I'll send you an email
regarding this request.

Thanks

El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal 
escribió:

> @Chris This sounds fantastic, please send summary notes for Seattle folks
>
> @Felix I work in downtown Seattle, am wondering if we should a tech meetup
> around model serving in spark at my work or elsewhere close, thoughts?  I’m
> actually in the midst of building microservices to manage models and when I
> say models I mean much more than machine learning models (think OR, process
> models as well)
>
> Regards
>
> Sent from my iPhone
>
> On May 31, 2018, at 10:32 PM, Chris Fregly  wrote:
>
> Hey everyone!
>
> @Felix:  thanks for putting this together.  i sent some of you a quick
> calendar event - mostly for me, so i don’t forget!  :)
>
> Coincidentally, this is the focus of June 6th's *Advanced Spark and
> TensorFlow Meetup*
> 
>  @5:30pm
> on June 6th (same night) here in SF!
>
> Everybody is welcome to come.  Here’s the link to the meetup that includes
> the signup link:
> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
> 
>
> We have an awesome lineup of speakers covered a lot of deep, technical
> ground.
>
> For those who can’t attend in person, we’ll be broadcasting live - and
> posting the recording afterward.
>
> All details are in the meetup link above…
>
> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
> welcome to give a talk. I can move things around to make room.
>
> @joseph:  I’d personally like an update on the direction of the Databricks
> proprietary ML Serving export format which is similar to PMML but not a
> standard in any way.
>
> Also, the Databricks ML Serving Runtime is only available to Databricks
> customers.  This seems in conflict with the community efforts described
> here.  Can you comment on behalf of Databricks?
>
> Look forward to your response, joseph.
>
> See you all soon!
>
> —
>
>
> *Chris Fregly *Founder @ *PipelineAI*  (100,000
> Users)
> Organizer @ *Advanced Spark and TensorFlow Meetup*
>  (85,000
> Global Members)
>
>
>
> *San Francisco - Chicago - Austin -  Washington DC - London - Dusseldorf *
> *Try our PipelineAI Community Edition with GPUs and TPUs!!
> *
>
>
> On May 30, 2018, at 9:32 AM, Felix Cheung 
> wrote:
>
> Hi!
>
> Thank you! Let’s meet then
>
> June 6 4pm
>
> Moscone West Convention Center
> 800 Howard Street, San Francisco, CA 94103
> 
>
> Ground floor (outside of conference area - should be available for all) -
> we will meet and decide where to go
>
> (Would not send invite because that would be too much noise for dev@)
>
> To paraphrase Joseph, we will use this to kick off the discusssion and
> post notes after and follow up online. As for Seattle, I would be very
> interested to meet in person lateen and discuss ;)
>
>
> _
> From: Saikat Kanjilal 
> Sent: Tuesday, May 29, 2018 11:46 AM
> Subject: Re: Revisiting Online serving of Spark models?
> To: Maximiliano Felice 
> Cc: Felix Cheung , Holden Karau <
> hol...@pigscanfly.ca>, Joseph Bradley , Leif Walsh
> , dev 
>
>
> Would love to join but am in Seattle, thoughts on how to make this work?
>
> Regards
>
> Sent from my iPhone
>
> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> maximilianofel...@gmail.com> wrote:
>
> Big +1 to a meeting with fresh air.
>
> Could anyone send the invites? I don't really know which is the place
> Holden is talking about.
>
> 2018-05-29 14:27 GMT-03:00 Felix Cheung :
>
>> You had me at blue bottle!
>>
>> _
>> From: Holden Karau 
>> Sent: Tuesday, May 29, 2018 9:47 AM
>> Subject: Re: Revisiting Online serving of Spark models?
>> To: Felix Cheung 
>> Cc: Saikat Kanjilal , Maximiliano Felice <
>> maximilianofel...@gmail.com>, Joseph Bradley ,
>> Leif Walsh , dev 
>>
>>
>>
>> I'm down for that, we could all go for a walk maybe to the mint plazaa
>> blue bottle and grab coffee (if the weather holds have our design meeting
>> outside :p)?
>>
>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung 
>> wrote:
>>
>>> Bump.
>>>
>>> --
>>> *From:* Felix Cheung 
>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>>> *Cc:* Leif Walsh; Holden Karau; dev
>>>
>>> *Subject:* Re: Revisiting Online serving of 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Denny Lee
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas 
wrote:

> I'll give that a try, but I'll still have to figure out what to do if none
> of the release builds work with hadoop-aws, since Flintrock deploys Spark
> release builds to set up a cluster. Building Spark is slow, so we only do
> it if the user specifically requests a Spark version by git hash. (This is
> basically how spark-ec2 did things, too.)
>
>
> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin  wrote:
>
>> If you're building your own Spark, definitely try the hadoop-cloud
>> profile. Then you don't even need to pull anything at runtime,
>> everything is already packaged with Spark.
>>
>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>  wrote:
>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
>> > either (even building with -Phadoop-2.7). I guess I’ve been relying on
>> an
>> > unsupported pattern and will need to figure something else out going
>> forward
>> > in order to use s3a://.
>> >
>> >
>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin 
>> wrote:
>> >>
>> >> I have personally never tried to include hadoop-aws that way. But at
>> >> the very least, I'd try to use the same version of Hadoop as the Spark
>> >> build (2.7.3 IIRC). I don't really expect a different version to work,
>> >> and if it did in the past it definitely was not by design.
>> >>
>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> >>  wrote:
>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>> release,
>> >> > so
>> >> > it appears something has changed since then.
>> >> >
>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >> >
>> >> > My goal here is simply to confirm that this release of Spark works
>> with
>> >> > hadoop-aws like past releases did, particularly for Flintrock users
>> who
>> >> > use
>> >> > Spark with S3A.
>> >> >
>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>> builds
>> >> > with
>> >> > every Spark release. If the -hadoop2.7 release build won’t work with
>> >> > hadoop-aws anymore, are there plans to provide a new build type that
>> >> > will?
>> >> >
>> >> > Apologies if the question is poorly formed. I’m batting a bit
>> outside my
>> >> > league here. Again, my goal is simply to confirm that I/my users
>> still
>> >> > have
>> >> > a way to use s3a://. In the past, that way was simply to call pyspark
>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>> similar.
>> >> > If
>> >> > that will no longer work, I’m trying to confirm that the change of
>> >> > behavior
>> >> > is intentional or acceptable (as a review for the Spark project) and
>> >> > figure
>> >> > out what I need to change (as due diligence for Flintrock’s users).
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
>> >> > wrote:
>> >> >>
>> >> >> Using the hadoop-aws package is probably going to be a little more
>> >> >> complicated than that. The best bet is to use a custom build of
>> Spark
>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> >> looking at some nasty dependency issues, especially if you end up
>> >> >> mixing different versions of Hadoop.
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> >>  wrote:
>> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
>> RC4
>> >> >> > using
>> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> >> > some
>> >> >> > errors.
>> >> >> >
>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >> >
>> >> >> > 
>> >> >> >
>> >> >> > :: problems summary ::
>> >> >> >  WARNINGS
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >> >
>> >> >> > I’d guess I’m 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Nicholas Chammas
I'll give that a try, but I'll still have to figure out what to do if none
of the release builds work with hadoop-aws, since Flintrock deploys Spark
release builds to set up a cluster. Building Spark is slow, so we only do
it if the user specifically requests a Spark version by git hash. (This is
basically how spark-ec2 did things, too.)

On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin  wrote:

> If you're building your own Spark, definitely try the hadoop-cloud
> profile. Then you don't even need to pull anything at runtime,
> everything is already packaged with Spark.
>
> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>  wrote:
> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> > either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> > unsupported pattern and will need to figure something else out going
> forward
> > in order to use s3a://.
> >
> >
> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin 
> wrote:
> >>
> >> I have personally never tried to include hadoop-aws that way. But at
> >> the very least, I'd try to use the same version of Hadoop as the Spark
> >> build (2.7.3 IIRC). I don't really expect a different version to work,
> >> and if it did in the past it definitely was not by design.
> >>
> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
> >>  wrote:
> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
> release,
> >> > so
> >> > it appears something has changed since then.
> >> >
> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
> >> >
> >> > My goal here is simply to confirm that this release of Spark works
> with
> >> > hadoop-aws like past releases did, particularly for Flintrock users
> who
> >> > use
> >> > Spark with S3A.
> >> >
> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
> builds
> >> > with
> >> > every Spark release. If the -hadoop2.7 release build won’t work with
> >> > hadoop-aws anymore, are there plans to provide a new build type that
> >> > will?
> >> >
> >> > Apologies if the question is poorly formed. I’m batting a bit outside
> my
> >> > league here. Again, my goal is simply to confirm that I/my users still
> >> > have
> >> > a way to use s3a://. In the past, that way was simply to call pyspark
> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
> similar.
> >> > If
> >> > that will no longer work, I’m trying to confirm that the change of
> >> > behavior
> >> > is intentional or acceptable (as a review for the Spark project) and
> >> > figure
> >> > out what I need to change (as due diligence for Flintrock’s users).
> >> >
> >> > Nick
> >> >
> >> >
> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
> >> > wrote:
> >> >>
> >> >> Using the hadoop-aws package is probably going to be a little more
> >> >> complicated than that. The best bet is to use a custom build of Spark
> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
> >> >> looking at some nasty dependency issues, especially if you end up
> >> >> mixing different versions of Hadoop.
> >> >>
> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
> >> >>  wrote:
> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
> RC4
> >> >> > using
> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
> >> >> > some
> >> >> > errors.
> >> >> >
> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
> >> >> >
> >> >> > 
> >> >> >
> >> >> > :: problems summary ::
> >> >> >  WARNINGS
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
> >> >> > [NOT FOUND  ]
> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
> >> >> >
> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
> >> >> > called
> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
> >> >> > try.
> >> >> >
> >> >> > Any quick pointers?
> >> 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Wenchen Fan
+1

On Sun, Jun 3, 2018 at 6:54 AM, Marcelo Vanzin  wrote:

> If you're building your own Spark, definitely try the hadoop-cloud
> profile. Then you don't even need to pull anything at runtime,
> everything is already packaged with Spark.
>
> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>  wrote:
> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> > either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> > unsupported pattern and will need to figure something else out going
> forward
> > in order to use s3a://.
> >
> >
> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin 
> wrote:
> >>
> >> I have personally never tried to include hadoop-aws that way. But at
> >> the very least, I'd try to use the same version of Hadoop as the Spark
> >> build (2.7.3 IIRC). I don't really expect a different version to work,
> >> and if it did in the past it definitely was not by design.
> >>
> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
> >>  wrote:
> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
> release,
> >> > so
> >> > it appears something has changed since then.
> >> >
> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
> >> >
> >> > My goal here is simply to confirm that this release of Spark works
> with
> >> > hadoop-aws like past releases did, particularly for Flintrock users
> who
> >> > use
> >> > Spark with S3A.
> >> >
> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
> builds
> >> > with
> >> > every Spark release. If the -hadoop2.7 release build won’t work with
> >> > hadoop-aws anymore, are there plans to provide a new build type that
> >> > will?
> >> >
> >> > Apologies if the question is poorly formed. I’m batting a bit outside
> my
> >> > league here. Again, my goal is simply to confirm that I/my users still
> >> > have
> >> > a way to use s3a://. In the past, that way was simply to call pyspark
> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
> similar.
> >> > If
> >> > that will no longer work, I’m trying to confirm that the change of
> >> > behavior
> >> > is intentional or acceptable (as a review for the Spark project) and
> >> > figure
> >> > out what I need to change (as due diligence for Flintrock’s users).
> >> >
> >> > Nick
> >> >
> >> >
> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
> >> > wrote:
> >> >>
> >> >> Using the hadoop-aws package is probably going to be a little more
> >> >> complicated than that. The best bet is to use a custom build of Spark
> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
> >> >> looking at some nasty dependency issues, especially if you end up
> >> >> mixing different versions of Hadoop.
> >> >>
> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
> >> >>  wrote:
> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
> RC4
> >> >> > using
> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
> >> >> > some
> >> >> > errors.
> >> >> >
> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
> >> >> >
> >> >> > 
> >> >> >
> >> >> > :: problems summary ::
> >> >> >  WARNINGS
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
> jersey-json/1.9/jersey-json-1.9.jar
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
> jersey-server/1.9/jersey-server-1.9.jar
> >> >> > [NOT FOUND  ]
> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/
> jettison/jettison/1.1/jettison-1.1.jar
> >> >> > [NOT FOUND  ]
> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
> >> >> >  local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/
> jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
> >> >> >
> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
> >> >> > called
> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
> >> >> > try.
> >> >> >
> >> >> > Any quick pointers?
> >> >> >
> >> >> > Nick
> >> >> >
> >> >> >
> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin  >
> >> >> > wrote:
> >> >> >>
> >> >> >> Starting with my own +1 (binding).
> >> >> >>
> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <
> van...@cloudera.com>
> >> >> >> wrote:
> >> >> >> > Please vote on releasing the 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Marcelo Vanzin
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
 wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin  wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>>  wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >>  wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > 
>> >> >
>> >> > :: problems summary ::
>> >> >  WARNINGS
>> >> > [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> > [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> > [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> > [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin 
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin 
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Sean Owen
+1 from me with the same comments as in the last RC.

On Fri, Jun 1, 2018 at 5:29 PM Marcelo Vanzin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.3.1.
>
> Given that I expect at least a few people to be busy with Spark Summit next
> week, I'm taking the liberty of setting an extended voting period. The vote
> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>
> It passes with a majority of +1 votes, which must include at least 3 +1
> votes
> from the PMC.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> https://github.com/apache/spark/tree/v2.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1272/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>