Re: Revisiting Online serving of Spark models?
So I kicked of a thread on user@ to collect people's feedback there but I'll summarize the offline results later this week too. On Tue, Jun 12, 2018, 5:03 AM Liang-Chi Hsieh wrote: > > Hi, > > It'd be great if there can be any sharing of the offline discussion. > Thanks! > > > > Holden Karau wrote > > We’re by the registration sign going to start walking over at 4:05 > > > > On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice < > > > maximilianofelice@ > > >> wrote: > > > >> Hi! > >> > >> Do we meet at the entrance? > >> > >> See you > >> > >> > >> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath < > >> > > > nick.pentreath@ > > >> escribió: > >> > >>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. > >>> > >>> On Sun, 3 Jun 2018 at 00:24 Holden Karau > > > holden@ > > > wrote: > >>> > On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < > > > > maximilianofelice@ > > >> wrote: > > > Hi! > > > > We're already in San Francisco waiting for the summit. We even think > > that we spotted @holdenk this afternoon. > > > Unless you happened to be walking by my garage probably not super > likely, spent the day working on scooters/motorcycles (my style is a > little > less unique in SF :)). Also if you see me feel free to say hi unless I > look > like I haven't had my first coffee of the day, love chatting with > folks > IRL > :) > > > > > @chris, we're really interested in the Meetup you're hosting. My team > > will probably join it since the beginning of you have room for us, > and > > I'll > > join it later after discussing the topics on this thread. I'll send > > you an > > email regarding this request. > > > > Thanks > > > > El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal < > > > > > sxk1969@ > > >> escribió: > > > >> @Chris This sounds fantastic, please send summary notes for Seattle > >> folks > >> > >> @Felix I work in downtown Seattle, am wondering if we should a tech > >> meetup around model serving in spark at my work or elsewhere close, > >> thoughts? I’m actually in the midst of building microservices to > >> manage > >> models and when I say models I mean much more than machine learning > >> models > >> (think OR, process models as well) > >> > >> Regards > >> > >> Sent from my iPhone > >> > >> On May 31, 2018, at 10:32 PM, Chris Fregly > > > chris@ > > > wrote: > >> > >> Hey everyone! > >> > >> @Felix: thanks for putting this together. i sent some of you a > >> quick > >> calendar event - mostly for me, so i don’t forget! :) > >> > >> Coincidentally, this is the focus of June 6th's *Advanced Spark and > >> TensorFlow Meetup* > >> > https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/ > ; > >> @5:30pm > >> on June 6th (same night) here in SF! > >> > >> Everybody is welcome to come. Here’s the link to the meetup that > >> includes the signup link: > >> * > https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* > >> > https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/ > ; > >> > >> We have an awesome lineup of speakers covered a lot of deep, > >> technical > >> ground. > >> > >> For those who can’t attend in person, we’ll be broadcasting live - > >> and > >> posting the recording afterward. > >> > >> All details are in the meetup link above… > >> > >> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than > >> welcome to give a talk. I can move things around to make room. > >> > >> @joseph: I’d personally like an update on the direction of the > >> Databricks proprietary ML Serving export format which is similar to > >> PMML > >> but not a standard in any way. > >> > >> Also, the Databricks ML Serving Runtime is only available to > >> Databricks customers. This seems in conflict with the community > >> efforts > >> described here. Can you comment on behalf of Databricks? > >> > >> Look forward to your response, joseph. > >> > >> See you all soon! > >> > >> — > >> > >> > >> *Chris Fregly *Founder @ *PipelineAI* https://pipeline.ai/; > >> (100,000 > >> Users) > >> Organizer @ *Advanced Spark and TensorFlow Meetup* > >> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/ > ; > >> (85,000 > >> Global Members) > >> > >> > >> > >> *San Francisco - Chicago - Austin - > >> Washington DC - London - Dusseldorf * > >> *Try our PipelineAI Community Edition with GPUs and TPUs!! > >> http://community.pipeline.ai/* > >> > >> > >> On May 30, 2018, at 9:32 AM, Felix Cheung > > > felixcheung_m@ > > > > >> wrote: > >> > >>
Re: Revisiting Online serving of Spark models?
Hi, It'd be great if there can be any sharing of the offline discussion. Thanks! Holden Karau wrote > We’re by the registration sign going to start walking over at 4:05 > > On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice < > maximilianofelice@ >> wrote: > >> Hi! >> >> Do we meet at the entrance? >> >> See you >> >> >> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath < >> > nick.pentreath@ >> escribió: >> >>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. >>> >>> On Sun, 3 Jun 2018 at 00:24 Holden Karau > holden@ > wrote: >>> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < > maximilianofelice@ >> wrote: > Hi! > > We're already in San Francisco waiting for the summit. We even think > that we spotted @holdenk this afternoon. > Unless you happened to be walking by my garage probably not super likely, spent the day working on scooters/motorcycles (my style is a little less unique in SF :)). Also if you see me feel free to say hi unless I look like I haven't had my first coffee of the day, love chatting with folks IRL :) > > @chris, we're really interested in the Meetup you're hosting. My team > will probably join it since the beginning of you have room for us, and > I'll > join it later after discussing the topics on this thread. I'll send > you an > email regarding this request. > > Thanks > > El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal < > > sxk1969@ >> escribió: > >> @Chris This sounds fantastic, please send summary notes for Seattle >> folks >> >> @Felix I work in downtown Seattle, am wondering if we should a tech >> meetup around model serving in spark at my work or elsewhere close, >> thoughts? I’m actually in the midst of building microservices to >> manage >> models and when I say models I mean much more than machine learning >> models >> (think OR, process models as well) >> >> Regards >> >> Sent from my iPhone >> >> On May 31, 2018, at 10:32 PM, Chris Fregly > chris@ > wrote: >> >> Hey everyone! >> >> @Felix: thanks for putting this together. i sent some of you a >> quick >> calendar event - mostly for me, so i don’t forget! :) >> >> Coincidentally, this is the focus of June 6th's *Advanced Spark and >> TensorFlow Meetup* >> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/; >> @5:30pm >> on June 6th (same night) here in SF! >> >> Everybody is welcome to come. Here’s the link to the meetup that >> includes the signup link: >> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* >> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/; >> >> We have an awesome lineup of speakers covered a lot of deep, >> technical >> ground. >> >> For those who can’t attend in person, we’ll be broadcasting live - >> and >> posting the recording afterward. >> >> All details are in the meetup link above… >> >> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than >> welcome to give a talk. I can move things around to make room. >> >> @joseph: I’d personally like an update on the direction of the >> Databricks proprietary ML Serving export format which is similar to >> PMML >> but not a standard in any way. >> >> Also, the Databricks ML Serving Runtime is only available to >> Databricks customers. This seems in conflict with the community >> efforts >> described here. Can you comment on behalf of Databricks? >> >> Look forward to your response, joseph. >> >> See you all soon! >> >> — >> >> >> *Chris Fregly *Founder @ *PipelineAI* https://pipeline.ai/; >> (100,000 >> Users) >> Organizer @ *Advanced Spark and TensorFlow Meetup* >> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/; >> (85,000 >> Global Members) >> >> >> >> *San Francisco - Chicago - Austin - >> Washington DC - London - Dusseldorf * >> *Try our PipelineAI Community Edition with GPUs and TPUs!! >> http://community.pipeline.ai/* >> >> >> On May 30, 2018, at 9:32 AM, Felix Cheung > felixcheung_m@ > >> wrote: >> >> Hi! >> >> Thank you! Let’s meet then >> >> June 6 4pm >> >> Moscone West Convention Center >> 800 Howard Street, San Francisco, CA 94103 >> https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103entry=gmailsource=g; >> >> Ground floor (outside of conference area - should be available for >> all) - we will meet and decide where to go >> >> (Would not send invite because that would be too much noise for dev@) >> >> To paraphrase Joseph, we
[ANNOUNCE] Announcing Apache Spark 2.3.1
We are happy to announce the availability of Spark 2.3.1! Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.1, head over to the download page: http://spark.apache.org/downloads.html To view the release notes: https://spark.apache.org/releases/spark-release-2-3-1.html We would like to acknowledge all community members for contributing to this release. This release would not have been possible without you. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
[build system] DOWNTIME ALERT! jenkins will be down all day july 16th (saturday)
hey everyone! we have another power "event" for our building on campus... this is to both fix the high-voltage lead that the city of berkeley accidentally cut last year during construction, as well as to install two new UPS systems in one of our on-prem machine rooms. while jenkins will still be up, it won't be reachable as the machine hosting the reverse-proxy will be down. downtime begins ~11pm this friday (june 15th), and everything should be back up saturday evening (june 16th). i will be out of town, but one of our sysadmins will be checking on jenkins while i'm away. shane -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Very slow complex type column reads from parquet
Hello, We have stumbled upon a quite degraded performance when reading a complex (struct, array) type columns stored in Parquet. A Parquet file is of around 600MB (snappy) with ~400k rows with a field of a complex type { f1: array of ints, f2: array of ints } where f1 array length is 50k elements. There are also other fields like entity_id: long, timestamp: long. A simple query that selects rows using predicates entity_id = X and timestamp >= T1 and timestamp <= T2 plus ds.show() takes 17 minutes to execute. If we remove the complex type columns from the query it is executed in a sub-second time. Now when looking at the implementation of the Parquet datasource the Vectorized* classes are used only if the read types are primitives. In other case the code falls back to the parquet-mr default implementation. In the VectorizedParquetRecordReader there is a TODO to handle complex types that "should be efficient & easy with codegen". For our CERN Spark usage the current execution times are pretty much prohibitive as there is a lot of data stored as arrays / complex types… The file of 600 MB represents 1 day of measurements and our data scientists would like to process sometimes months or even years of those. Could you please let me know if there is anybody currently working on it or maybe you have it in a roadmap for the future? Or maybe you could give me some suggestions how to avoid / resolve this problem? I’m using Spark 2.2.1. Best regards, Jakub Wozniak - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org