Re: Revisiting Online serving of Spark models?

2018-06-11 Thread Holden Karau
So I kicked of a thread on user@ to collect people's feedback there but
I'll summarize the offline results later this week too.

On Tue, Jun 12, 2018, 5:03 AM Liang-Chi Hsieh  wrote:

>
> Hi,
>
> It'd be great if there can be any sharing of the offline discussion.
> Thanks!
>
>
>
> Holden Karau wrote
> > We’re by the registration sign going to start walking over at 4:05
> >
> > On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice <
>
> > maximilianofelice@
>
> >> wrote:
> >
> >> Hi!
> >>
> >> Do we meet at the entrance?
> >>
> >> See you
> >>
> >>
> >> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <
> >>
>
> > nick.pentreath@
>
> >> escribió:
> >>
> >>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it.
> >>>
> >>> On Sun, 3 Jun 2018 at 00:24 Holden Karau 
>
> > holden@
>
> >  wrote:
> >>>
>  On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice <
> 
>
> > maximilianofelice@
>
> >> wrote:
> 
> > Hi!
> >
> > We're already in San Francisco waiting for the summit. We even think
> > that we spotted @holdenk this afternoon.
> >
>  Unless you happened to be walking by my garage probably not super
>  likely, spent the day working on scooters/motorcycles (my style is a
>  little
>  less unique in SF :)). Also if you see me feel free to say hi unless I
>  look
>  like I haven't had my first coffee of the day, love chatting with
> folks
>  IRL
>  :)
> 
> >
> > @chris, we're really interested in the Meetup you're hosting. My team
> > will probably join it since the beginning of you have room for us,
> and
> > I'll
> > join it later after discussing the topics on this thread. I'll send
> > you an
> > email regarding this request.
> >
> > Thanks
> >
> > El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <
> >
>
> > sxk1969@
>
> >> escribió:
> >
> >> @Chris This sounds fantastic, please send summary notes for Seattle
> >> folks
> >>
> >> @Felix I work in downtown Seattle, am wondering if we should a tech
> >> meetup around model serving in spark at my work or elsewhere close,
> >> thoughts?  I’m actually in the midst of building microservices to
> >> manage
> >> models and when I say models I mean much more than machine learning
> >> models
> >> (think OR, process models as well)
> >>
> >> Regards
> >>
> >> Sent from my iPhone
> >>
> >> On May 31, 2018, at 10:32 PM, Chris Fregly 
>
> > chris@
>
> >  wrote:
> >>
> >> Hey everyone!
> >>
> >> @Felix:  thanks for putting this together.  i sent some of you a
> >> quick
> >> calendar event - mostly for me, so i don’t forget!  :)
> >>
> >> Coincidentally, this is the focus of June 6th's *Advanced Spark and
> >> TensorFlow Meetup*
> >> 
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/
> ;
> >> @5:30pm
> >> on June 6th (same night) here in SF!
> >>
> >> Everybody is welcome to come.  Here’s the link to the meetup that
> >> includes the signup link:
> >> *
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
> >> 
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/
> ;
> >>
> >> We have an awesome lineup of speakers covered a lot of deep,
> >> technical
> >> ground.
> >>
> >> For those who can’t attend in person, we’ll be broadcasting live -
> >> and
> >> posting the recording afterward.
> >>
> >> All details are in the meetup link above…
> >>
> >> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
> >> welcome to give a talk. I can move things around to make room.
> >>
> >> @joseph:  I’d personally like an update on the direction of the
> >> Databricks proprietary ML Serving export format which is similar to
> >> PMML
> >> but not a standard in any way.
> >>
> >> Also, the Databricks ML Serving Runtime is only available to
> >> Databricks customers.  This seems in conflict with the community
> >> efforts
> >> described here.  Can you comment on behalf of Databricks?
> >>
> >> Look forward to your response, joseph.
> >>
> >> See you all soon!
> >>
> >> —
> >>
> >>
> >> *Chris Fregly *Founder @ *PipelineAI* https://pipeline.ai/;
> >> (100,000
> >> Users)
> >> Organizer @ *Advanced Spark and TensorFlow Meetup*
> >> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/
> ;
> >> (85,000
> >> Global Members)
> >>
> >>
> >>
> >> *San Francisco - Chicago - Austin -
> >> Washington DC - London - Dusseldorf *
> >> *Try our PipelineAI Community Edition with GPUs and TPUs!!
> >> http://community.pipeline.ai/*
> >>
> >>
> >> On May 30, 2018, at 9:32 AM, Felix Cheung 
>
> > felixcheung_m@
>
> > 
> >> wrote:
> >>
> >> 

Re: Revisiting Online serving of Spark models?

2018-06-11 Thread Liang-Chi Hsieh


Hi,

It'd be great if there can be any sharing of the offline discussion. Thanks!



Holden Karau wrote
> We’re by the registration sign going to start walking over at 4:05
> 
> On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice <

> maximilianofelice@

>> wrote:
> 
>> Hi!
>>
>> Do we meet at the entrance?
>>
>> See you
>>
>>
>> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <
>> 

> nick.pentreath@

>> escribió:
>>
>>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it.
>>>
>>> On Sun, 3 Jun 2018 at 00:24 Holden Karau 

> holden@

>  wrote:
>>>
 On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice <
 

> maximilianofelice@

>> wrote:

> Hi!
>
> We're already in San Francisco waiting for the summit. We even think
> that we spotted @holdenk this afternoon.
>
 Unless you happened to be walking by my garage probably not super
 likely, spent the day working on scooters/motorcycles (my style is a
 little
 less unique in SF :)). Also if you see me feel free to say hi unless I
 look
 like I haven't had my first coffee of the day, love chatting with folks
 IRL
 :)

>
> @chris, we're really interested in the Meetup you're hosting. My team
> will probably join it since the beginning of you have room for us, and
> I'll
> join it later after discussing the topics on this thread. I'll send
> you an
> email regarding this request.
>
> Thanks
>
> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <
> 

> sxk1969@

>> escribió:
>
>> @Chris This sounds fantastic, please send summary notes for Seattle
>> folks
>>
>> @Felix I work in downtown Seattle, am wondering if we should a tech
>> meetup around model serving in spark at my work or elsewhere close,
>> thoughts?  I’m actually in the midst of building microservices to
>> manage
>> models and when I say models I mean much more than machine learning
>> models
>> (think OR, process models as well)
>>
>> Regards
>>
>> Sent from my iPhone
>>
>> On May 31, 2018, at 10:32 PM, Chris Fregly 

> chris@

>  wrote:
>>
>> Hey everyone!
>>
>> @Felix:  thanks for putting this together.  i sent some of you a
>> quick
>> calendar event - mostly for me, so i don’t forget!  :)
>>
>> Coincidentally, this is the focus of June 6th's *Advanced Spark and
>> TensorFlow Meetup*
>> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/;
>> @5:30pm
>> on June 6th (same night) here in SF!
>>
>> Everybody is welcome to come.  Here’s the link to the meetup that
>> includes the signup link:
>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
>> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/;
>>
>> We have an awesome lineup of speakers covered a lot of deep,
>> technical
>> ground.
>>
>> For those who can’t attend in person, we’ll be broadcasting live -
>> and
>> posting the recording afterward.
>>
>> All details are in the meetup link above…
>>
>> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
>> welcome to give a talk. I can move things around to make room.
>>
>> @joseph:  I’d personally like an update on the direction of the
>> Databricks proprietary ML Serving export format which is similar to
>> PMML
>> but not a standard in any way.
>>
>> Also, the Databricks ML Serving Runtime is only available to
>> Databricks customers.  This seems in conflict with the community
>> efforts
>> described here.  Can you comment on behalf of Databricks?
>>
>> Look forward to your response, joseph.
>>
>> See you all soon!
>>
>> —
>>
>>
>> *Chris Fregly *Founder @ *PipelineAI* https://pipeline.ai/;
>> (100,000
>> Users)
>> Organizer @ *Advanced Spark and TensorFlow Meetup*
>> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/;
>> (85,000
>> Global Members)
>>
>>
>>
>> *San Francisco - Chicago - Austin -
>> Washington DC - London - Dusseldorf *
>> *Try our PipelineAI Community Edition with GPUs and TPUs!!
>> http://community.pipeline.ai/*
>>
>>
>> On May 30, 2018, at 9:32 AM, Felix Cheung 

> felixcheung_m@

> 
>> wrote:
>>
>> Hi!
>>
>> Thank you! Let’s meet then
>>
>> June 6 4pm
>>
>> Moscone West Convention Center
>> 800 Howard Street, San Francisco, CA 94103
>> https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103entry=gmailsource=g;
>>
>> Ground floor (outside of conference area - should be available for
>> all) - we will meet and decide where to go
>>
>> (Would not send invite because that would be too much noise for dev@)
>>
>> To paraphrase Joseph, we 

[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1!

Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.

To download Spark 2.3.1, head over to the download page:
http://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-2-3-1.html

We would like to acknowledge all community members for contributing to
this release. This release would not have been possible without you.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[build system] DOWNTIME ALERT! jenkins will be down all day july 16th (saturday)

2018-06-11 Thread shane knapp
hey everyone!

we have another power "event" for our building on campus...  this is to
both fix the high-voltage lead that the city of berkeley accidentally cut
last year during construction, as well as to install two new UPS systems in
one of our on-prem machine rooms.

while jenkins will still be up, it won't be reachable as the machine
hosting the reverse-proxy will be down.

downtime begins ~11pm this friday (june 15th), and everything should be
back up saturday evening (june 16th).

i will be out of town, but one of our sysadmins will be checking on jenkins
while i'm away.

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Very slow complex type column reads from parquet

2018-06-11 Thread Jakub Wozniak
Hello,

We have stumbled upon a quite degraded performance when reading a complex 
(struct, array) type columns stored in Parquet. 
A Parquet file is of around 600MB (snappy) with ~400k rows with a field of a 
complex type { f1: array of ints, f2: array of ints } where f1 array length is 
50k elements. 
There are also other fields like entity_id: long, timestamp: long. 

A simple query that selects rows using predicates entity_id = X and timestamp 
>= T1 and timestamp <= T2 plus ds.show() takes 17 minutes to execute. 
If we remove the complex type columns from the query it is executed in a 
sub-second time. 
  
Now when looking at the implementation of the Parquet datasource the 
Vectorized* classes are used only if the read types are primitives. In other 
case the code falls back to the parquet-mr default implementation. 
In the VectorizedParquetRecordReader there is a TODO to handle complex types 
that "should be efficient & easy with codegen". 

For our CERN Spark usage the current execution times are pretty much 
prohibitive as there is a lot of data stored as arrays / complex types… 
The file of 600 MB represents 1 day of measurements and our data scientists 
would like to process sometimes months or even years of those.  

Could you please let me know if there is anybody currently working on it or 
maybe you have it in a roadmap for the future? 
Or maybe you could give me some suggestions how to avoid / resolve this 
problem? I’m using Spark 2.2.1. 

Best regards,
Jakub Wozniak




-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org