Re: ml models distribution

2016-07-22 Thread Chris Fregly
hey everyone-

this concept of deploying your Spark ML Pipelines and Algos into Production
(user-facing production) has been coming up a lot recently.

so much so, that i've dedicated the last few months of my research and
engineering efforts to build out the infrastructure to support this in a
highly-scalable, highly-available way.

i've combined my Netflix + NetflixOSS work experience with my
Databricks/IBM + Spark work experience into an open source project,
PipelineIO, here:  http://pipeline.io

we're even serving up TensorFlow AI models using the same infrastructure -
incorporating key patterns from TensorFlow Distributed + TensorFlow Serving!

everything is open source, based on Docker + Kubernetes + NetflixOSS +
Spark + TensorFlow + Redis + Hybrid Cloud + On-Premise + Kafka + Zeppelin +
Jupyter/iPython with a heavy emphasis on metrics and monitoring of models
and server production statistics.

we're doing code generation directly from the saved Spark ML models (thanks
Spark 2.0 for giving us save/load parity across all models!) for optimized
model serving using both CPUs and GPUs, incremental training of models,
autoscaling, the whole works.

our friend from Netflix, Chaos Monkey, even makes a grim appearance from
time to time to prove that we're resilient to failure.

take a peek.  it's cool.  we've come a long way in the last couple months,
and we've got a lot of work left to do, but the core infrastructure is in
place, key features have been built, and we're moving quickly.

shoot me an email if you'd like to get involved.  lots of TODO's.

we're dedicating my upcoming Advanced Spark and TensorFlow Meetup on August
4th in SF to demo'ing this infrastructure to you all.

here's the link:
http://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/231457813/


video recording + screen capture will be posted afterward, as always.

we've got a workshop dedicated to building an end-to-end Spark ML and
Kafka-based Recommendation Pipeline - including the PipelineIO serving
platform.  link is here:  http://pipeline.io

and i'm finishing a blog post soon to detail everything we've done so far -
and everything we're actively building.  this post will be available on
http://pipeline.io - as well as cross-posted to a number of my favorite
engineering blogs.

global demo roadshow starts 8/8.  shoot me an email if you want to see all
this in action, otherwise i'll see you at a workshop or meetup near you!  :)



On Fri, Jul 22, 2016 at 10:34 AM, Inam Ur Rehman 
wrote:

> Hello guys..i know its irrelevant to this topic but i've been looking
> desperately for the solution. I am facing en exception
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html
>
> plz help me.. I couldn't find any solution.. plz
>
> On Fri, Jul 22, 2016 at 6:12 PM, Sean Owen  wrote:
>
>> No there isn't anything in particular, beyond the various bits of
>> serialization support that write out something to put in your storage
>> to begin with. What you do with it after reading and before writing is
>> up to your app, on purpose.
>>
>> If you mean you're producing data outside the model that your model
>> uses, your model data might be produced by an RDD operation, and saved
>> that way. There it's no different than anything else you do with RDDs.
>>
>> What part are you looking to automate beyond those things? that's most of
>> it.
>>
>> On Fri, Jul 22, 2016 at 2:04 PM, Sergio Fernández 
>> wrote:
>> > Hi Sean,
>> >
>> > On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
>> >>
>> >> If you mean, how do you distribute a new model in your application,
>> >> then there's no magic to it. Just reference the new model in the
>> >> functions you're executing in your driver.
>> >>
>> >> If you implemented some other manual way of deploying model info, just
>> >> do that again. There's no special thing to know.
>> >
>> >
>> > Well, because some huge model, we typically bundle both logic
>> > (pipeline/application)  and models separately. Normally we use a shared
>> > stores (e.g., HDFS) or coordinated distribution of the models. But I
>> wanted
>> > to know if there is any infrastructure in Spark that specifically
>> addresses
>> > such need.
>> >
>> > Thanks.
>> >
>> > Cheers,
>> >
>> > P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
>> > quite spread acronym. Sorry for the possible confusion.
>> >
>> >
>> > --
>> > Sergio Fernández
>> > Partner Technology Manager
>> > Redlink GmbH
>> > m: +43 6602747925
>> > e: sergio.fernan...@redlink.co
>> > w: http://redlink.co
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


-- 
*Chris Fregly*
Research Scientist @ PipelineIO
San Francisco, CA
pipeline.io
advancedspark.com


Re: ml models distribution

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution.. plz

On Fri, Jul 22, 2016 at 6:12 PM, Sean Owen  wrote:

> No there isn't anything in particular, beyond the various bits of
> serialization support that write out something to put in your storage
> to begin with. What you do with it after reading and before writing is
> up to your app, on purpose.
>
> If you mean you're producing data outside the model that your model
> uses, your model data might be produced by an RDD operation, and saved
> that way. There it's no different than anything else you do with RDDs.
>
> What part are you looking to automate beyond those things? that's most of
> it.
>
> On Fri, Jul 22, 2016 at 2:04 PM, Sergio Fernández 
> wrote:
> > Hi Sean,
> >
> > On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
> >>
> >> If you mean, how do you distribute a new model in your application,
> >> then there's no magic to it. Just reference the new model in the
> >> functions you're executing in your driver.
> >>
> >> If you implemented some other manual way of deploying model info, just
> >> do that again. There's no special thing to know.
> >
> >
> > Well, because some huge model, we typically bundle both logic
> > (pipeline/application)  and models separately. Normally we use a shared
> > stores (e.g., HDFS) or coordinated distribution of the models. But I
> wanted
> > to know if there is any infrastructure in Spark that specifically
> addresses
> > such need.
> >
> > Thanks.
> >
> > Cheers,
> >
> > P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
> > quite spread acronym. Sorry for the possible confusion.
> >
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: +43 6602747925
> > e: sergio.fernan...@redlink.co
> > w: http://redlink.co
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: ml models distribution

2016-07-22 Thread Sean Owen
No there isn't anything in particular, beyond the various bits of
serialization support that write out something to put in your storage
to begin with. What you do with it after reading and before writing is
up to your app, on purpose.

If you mean you're producing data outside the model that your model
uses, your model data might be produced by an RDD operation, and saved
that way. There it's no different than anything else you do with RDDs.

What part are you looking to automate beyond those things? that's most of it.

On Fri, Jul 22, 2016 at 2:04 PM, Sergio Fernández  wrote:
> Hi Sean,
>
> On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
>>
>> If you mean, how do you distribute a new model in your application,
>> then there's no magic to it. Just reference the new model in the
>> functions you're executing in your driver.
>>
>> If you implemented some other manual way of deploying model info, just
>> do that again. There's no special thing to know.
>
>
> Well, because some huge model, we typically bundle both logic
> (pipeline/application)  and models separately. Normally we use a shared
> stores (e.g., HDFS) or coordinated distribution of the models. But I wanted
> to know if there is any infrastructure in Spark that specifically addresses
> such need.
>
> Thanks.
>
> Cheers,
>
> P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
> quite spread acronym. Sorry for the possible confusion.
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ml models distribution

2016-07-22 Thread Sergio Fernández
Hi Sean,

On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
>
> If you mean, how do you distribute a new model in your application,
> then there's no magic to it. Just reference the new model in the
> functions you're executing in your driver.
>
> If you implemented some other manual way of deploying model info, just
> do that again. There's no special thing to know.
>

Well, because some huge model, we typically bundle both logic
(pipeline/application)  and models separately. Normally we use a shared
stores (e.g., HDFS) or coordinated distribution of the models. But I wanted
to know if there is any infrastructure in Spark that specifically addresses
such need.

Thanks.

Cheers,

P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
quite spread acronym. Sorry for the possible confusion.


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


Re: ml models distribution

2016-07-22 Thread Jacek Laskowski
Hehe, Sean. I knew that (and I knew the answer), but meant to ask a
co-question to help to find the answer *together* :)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
> Machine Learning
>
> If you mean, how do you distribute a new model in your application,
> then there's no magic to it. Just reference the new model in the
> functions you're executing in your driver.
>
> If you implemented some other manual way of deploying model info, just
> do that again. There's no special thing to know.
>
> On Fri, Jul 22, 2016 at 11:39 AM, Jacek Laskowski  wrote:
>> Hi,
>>
>> What's a ML model?
>>
>> (I'm sure once we found out the answer you'd know the answer for your
>> question :))
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Jul 22, 2016 at 11:49 AM, Sergio Fernández  wrote:
>>> Hi,
>>>
>>>  I have one question:
>>>
>>> How is the ML models distribution done across all nodes of a Spark cluster?
>>>
>>> I'm thinking about scenarios where the pipeline implementation does not
>>> necessary need to change, but the models have been upgraded.
>>>
>>> Thanks in advance.
>>>
>>> Best regards,
>>>
>>> --
>>> Sergio Fernández
>>> Partner Technology Manager
>>> Redlink GmbH
>>> m: +43 6602747925
>>> e: sergio.fernan...@redlink.co
>>> w: http://redlink.co
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ml models distribution

2016-07-22 Thread Sean Owen
Machine Learning

If you mean, how do you distribute a new model in your application,
then there's no magic to it. Just reference the new model in the
functions you're executing in your driver.

If you implemented some other manual way of deploying model info, just
do that again. There's no special thing to know.

On Fri, Jul 22, 2016 at 11:39 AM, Jacek Laskowski  wrote:
> Hi,
>
> What's a ML model?
>
> (I'm sure once we found out the answer you'd know the answer for your
> question :))
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 22, 2016 at 11:49 AM, Sergio Fernández  wrote:
>> Hi,
>>
>>  I have one question:
>>
>> How is the ML models distribution done across all nodes of a Spark cluster?
>>
>> I'm thinking about scenarios where the pipeline implementation does not
>> necessary need to change, but the models have been upgraded.
>>
>> Thanks in advance.
>>
>> Best regards,
>>
>> --
>> Sergio Fernández
>> Partner Technology Manager
>> Redlink GmbH
>> m: +43 6602747925
>> e: sergio.fernan...@redlink.co
>> w: http://redlink.co
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ml models distribution

2016-07-22 Thread Jacek Laskowski
Hi,

What's a ML model?

(I'm sure once we found out the answer you'd know the answer for your
question :))

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 22, 2016 at 11:49 AM, Sergio Fernández  wrote:
> Hi,
>
>  I have one question:
>
> How is the ML models distribution done across all nodes of a Spark cluster?
>
> I'm thinking about scenarios where the pipeline implementation does not
> necessary need to change, but the models have been upgraded.
>
> Thanks in advance.
>
> Best regards,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



ml models distribution

2016-07-22 Thread Sergio Fernández
Hi,

 I have one question:

How is the ML models distribution done across all nodes of a Spark cluster?

I'm thinking about scenarios where the pipeline implementation does not
necessary need to change, but the models have been upgraded.

Thanks in advance.

Best regards,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co