Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
It all depends on your latency requirements and volume. 100s of queries per
minute, with an acceptable latency of up to a few seconds? Yes, you could
use Spark for serving, especially if you're smart about caching results
(and I don't mean just Spark caching, but caching recommendation results
for example similar items etc).

However for many serving use cases using a Spark cluster is too much
overhead. Bear in mind real-world serving of many models (recommendations,
ad-serving, fraud etc) is one component of a complex workflow (e.g. one
page request in ad tech cases involves tens of requests and hops between
various ad servers and exchanges). That is why often the practical latency
bounds are < 100ms (or way, way tighter for ad serving for example).


On Fri, 1 Jul 2016 at 21:59 Saurabh Sardeshpande 
wrote:

> Hi Nick,
>
> Thanks for the answer. Do you think an implementation like the one in this
> article is infeasible in production for say, hundreds of queries per
> minute?
> https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2.
> The article uses Flask to define routes and Spark for evaluating requests.
>
> Regards,
> Saurabh
>
>
>
>
>
>
> On Fri, Jul 1, 2016 at 10:47 AM, Nick Pentreath 
> wrote:
>
>> Generally there are 2 ways to use a trained pipeline model - (offline)
>> batch scoring, and real-time online scoring.
>>
>> For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
>> certainly loading the model back in Spark and feeding new data through the
>> pipeline for prediction works just fine, and this is essentially what is
>> supported in 1.6 (and more or less full coverage in 2.0). For large batch
>> cases this can be quite efficient.
>>
>> However, usually for real-time use cases, the latency required is fairly
>> low - of the order of a few ms to a few 100ms for a request (some examples
>> include recommendations, ad-serving, fraud detection etc).
>>
>> In these cases, using Spark has 2 issues: (1) latency for prediction on
>> the pipeline, which is based on DataFrames and therefore distributed
>> execution, is usually fairly high "per request"; (2) this requires pulling
>> in all of Spark for your real-time serving layer (or running a full Spark
>> cluster), which is usually way too much overkill - all you really need for
>> serving is a bit of linear algebra and some basic transformations.
>>
>> So for now, unfortunately there is not much in the way of options for
>> exporting your pipelines and serving them outside of Spark - the
>> JPMML-based project mentioned on this thread is one option. The other
>> option at this point is to write your own export functionality and your own
>> serving layer.
>>
>> There is (very initial) movement towards improving the local serving
>> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
>> was the "first step" in this process).
>>
>> On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski  wrote:
>>
>>> Hi Rishabh,
>>>
>>> I've just today had similar conversation about how to do a ML Pipeline
>>> deployment and couldn't really answer this question and more because I
>>> don't really understand the use case.
>>>
>>> What would you expect from ML Pipeline model deployment? You can save
>>> your model to a file by model.write.overwrite.save("model_v1").
>>>
>>> model_v1
>>> |-- metadata
>>> |   |-- _SUCCESS
>>> |   `-- part-0
>>> `-- stages
>>> |-- 0_regexTok_b4265099cc1c
>>> |   `-- metadata
>>> |   |-- _SUCCESS
>>> |   `-- part-0
>>> |-- 1_hashingTF_8de997cf54ba
>>> |   `-- metadata
>>> |   |-- _SUCCESS
>>> |   `-- part-0
>>> `-- 2_linReg_3942a71d2c0e
>>> |-- data
>>> |   |-- _SUCCESS
>>> |   |-- _common_metadata
>>> |   |-- _metadata
>>> |   `--
>>> part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>>> `-- metadata
>>> |-- _SUCCESS
>>> `-- part-0
>>>
>>> 9 directories, 12 files
>>>
>>> What would you like to have outside SparkContext? What's wrong with
>>> using Spark? Just curious hoping to understand the use case better.
>>> Thanks.
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
>>> wrote:
>>> > Hi All,
>>> >
>>> > I am looking for ways to deploy a ML Pipeline model in production .
>>> > Spark has already proved to be a one of the best framework for model
>>> > training and creation, but once the ml pipeline model is ready how can
>>> I
>>> > deploy it outside spark context ?
>>> > MLlib model has toPMML method but today Pipeline model can not be
>>> saved to
>>> > PMML. There are some frameworks like MLeap which are trying to abstract
>>> > Pipeline Model and provide ML Pipeline Model deployment outside sp

Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
Sean is correct - we now use jpmml-model (which is actually BSD 3-clause,
where old jpmml was A2L, but either work)

On Fri, 1 Jul 2016 at 21:40 Sean Owen  wrote:

> (The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use
> JPMML in Spark and couldn't otherwise because the Affero license is
> not Apache compatible.)
>
> On Fri, Jul 1, 2016 at 8:16 PM, Nick Pentreath 
> wrote:
> > I believe open-scoring is one of the well-known PMML serving frameworks
> in
> > Java land (https://github.com/jpmml/openscoring). One can also use the
> raw
> > https://github.com/jpmml/jpmml-evaluator for embedding in apps.
> >
> > (Note the license on both of these is AGPL - the older version of JPMML
> used
> > to be Apache2 if I recall correctly).
> >
>


Re: Deploying ML Pipeline Model

2016-07-01 Thread Saurabh Sardeshpande
Hi Nick,

Thanks for the answer. Do you think an implementation like the one in this
article is infeasible in production for say, hundreds of queries per
minute?
https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2.
The article uses Flask to define routes and Spark for evaluating requests.

Regards,
Saurabh






On Fri, Jul 1, 2016 at 10:47 AM, Nick Pentreath 
wrote:

> Generally there are 2 ways to use a trained pipeline model - (offline)
> batch scoring, and real-time online scoring.
>
> For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> certainly loading the model back in Spark and feeding new data through the
> pipeline for prediction works just fine, and this is essentially what is
> supported in 1.6 (and more or less full coverage in 2.0). For large batch
> cases this can be quite efficient.
>
> However, usually for real-time use cases, the latency required is fairly
> low - of the order of a few ms to a few 100ms for a request (some examples
> include recommendations, ad-serving, fraud detection etc).
>
> In these cases, using Spark has 2 issues: (1) latency for prediction on
> the pipeline, which is based on DataFrames and therefore distributed
> execution, is usually fairly high "per request"; (2) this requires pulling
> in all of Spark for your real-time serving layer (or running a full Spark
> cluster), which is usually way too much overkill - all you really need for
> serving is a bit of linear algebra and some basic transformations.
>
> So for now, unfortunately there is not much in the way of options for
> exporting your pipelines and serving them outside of Spark - the
> JPMML-based project mentioned on this thread is one option. The other
> option at this point is to write your own export functionality and your own
> serving layer.
>
> There is (very initial) movement towards improving the local serving
> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
> was the "first step" in this process).
>
> On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski  wrote:
>
>> Hi Rishabh,
>>
>> I've just today had similar conversation about how to do a ML Pipeline
>> deployment and couldn't really answer this question and more because I
>> don't really understand the use case.
>>
>> What would you expect from ML Pipeline model deployment? You can save
>> your model to a file by model.write.overwrite.save("model_v1").
>>
>> model_v1
>> |-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> `-- stages
>> |-- 0_regexTok_b4265099cc1c
>> |   `-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> |-- 1_hashingTF_8de997cf54ba
>> |   `-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> `-- 2_linReg_3942a71d2c0e
>> |-- data
>> |   |-- _SUCCESS
>> |   |-- _common_metadata
>> |   |-- _metadata
>> |   `--
>> part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>> `-- metadata
>> |-- _SUCCESS
>> `-- part-0
>>
>> 9 directories, 12 files
>>
>> What would you like to have outside SparkContext? What's wrong with
>> using Spark? Just curious hoping to understand the use case better.
>> Thanks.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
>> wrote:
>> > Hi All,
>> >
>> > I am looking for ways to deploy a ML Pipeline model in production .
>> > Spark has already proved to be a one of the best framework for model
>> > training and creation, but once the ml pipeline model is ready how can I
>> > deploy it outside spark context ?
>> > MLlib model has toPMML method but today Pipeline model can not be saved
>> to
>> > PMML. There are some frameworks like MLeap which are trying to abstract
>> > Pipeline Model and provide ML Pipeline Model deployment outside spark
>> > context,but currently they don't have most of the ml transformers and
>> > estimators.
>> > I am looking for related work going on this area.
>> > Any pointers will be helpful.
>> >
>> > Thanks,
>> > Rishabh.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Deploying ML Pipeline Model

2016-07-01 Thread Sean Owen
(The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use
JPMML in Spark and couldn't otherwise because the Affero license is
not Apache compatible.)

On Fri, Jul 1, 2016 at 8:16 PM, Nick Pentreath  wrote:
> I believe open-scoring is one of the well-known PMML serving frameworks in
> Java land (https://github.com/jpmml/openscoring). One can also use the raw
> https://github.com/jpmml/jpmml-evaluator for embedding in apps.
>
> (Note the license on both of these is AGPL - the older version of JPMML used
> to be Apache2 if I recall correctly).
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
I believe open-scoring is one of the well-known PMML serving frameworks in
Java land (https://github.com/jpmml/openscoring). One can also use the raw
https://github.com/jpmml/jpmml-evaluator for embedding in apps.

(Note the license on both of these is AGPL - the older version of JPMML
used to be Apache2 if I recall correctly).

On Fri, 1 Jul 2016 at 20:15 Jacek Laskowski  wrote:

> Hi Nick,
>
> Thanks a lot for the exhaustive and prompt response! (In the meantime
> I watched a video about PMML to get a better understanding of the
> topic).
>
> What are the tools that could "consume" PMML exports (after running
> JPMML)? What tools would be the endpoint to deliver low-latency
> predictions by doing this "a bit of linear algebra and some basic
> transformations"?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 1, 2016 at 6:47 PM, Nick Pentreath 
> wrote:
> > Generally there are 2 ways to use a trained pipeline model - (offline)
> batch
> > scoring, and real-time online scoring.
> >
> > For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> > certainly loading the model back in Spark and feeding new data through
> the
> > pipeline for prediction works just fine, and this is essentially what is
> > supported in 1.6 (and more or less full coverage in 2.0). For large batch
> > cases this can be quite efficient.
> >
> > However, usually for real-time use cases, the latency required is fairly
> low
> > - of the order of a few ms to a few 100ms for a request (some examples
> > include recommendations, ad-serving, fraud detection etc).
> >
> > In these cases, using Spark has 2 issues: (1) latency for prediction on
> the
> > pipeline, which is based on DataFrames and therefore distributed
> execution,
> > is usually fairly high "per request"; (2) this requires pulling in all of
> > Spark for your real-time serving layer (or running a full Spark cluster),
> > which is usually way too much overkill - all you really need for serving
> is
> > a bit of linear algebra and some basic transformations.
> >
> > So for now, unfortunately there is not much in the way of options for
> > exporting your pipelines and serving them outside of Spark - the
> JPMML-based
> > project mentioned on this thread is one option. The other option at this
> > point is to write your own export functionality and your own serving
> layer.
> >
> > There is (very initial) movement towards improving the local serving
> > possibilities (see https://issues.apache.org/jira/browse/SPARK-13944
> which
> > was the "first step" in this process).
> >
> > On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski  wrote:
> >>
> >> Hi Rishabh,
> >>
> >> I've just today had similar conversation about how to do a ML Pipeline
> >> deployment and couldn't really answer this question and more because I
> >> don't really understand the use case.
> >>
> >> What would you expect from ML Pipeline model deployment? You can save
> >> your model to a file by model.write.overwrite.save("model_v1").
> >>
> >> model_v1
> >> |-- metadata
> >> |   |-- _SUCCESS
> >> |   `-- part-0
> >> `-- stages
> >> |-- 0_regexTok_b4265099cc1c
> >> |   `-- metadata
> >> |   |-- _SUCCESS
> >> |   `-- part-0
> >> |-- 1_hashingTF_8de997cf54ba
> >> |   `-- metadata
> >> |   |-- _SUCCESS
> >> |   `-- part-0
> >> `-- 2_linReg_3942a71d2c0e
> >> |-- data
> >> |   |-- _SUCCESS
> >> |   |-- _common_metadata
> >> |   |-- _metadata
> >> |   `--
> >> part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
> >> `-- metadata
> >> |-- _SUCCESS
> >> `-- part-0
> >>
> >> 9 directories, 12 files
> >>
> >> What would you like to have outside SparkContext? What's wrong with
> >> using Spark? Just curious hoping to understand the use case better.
> >> Thanks.
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> 
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> >> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
> >> wrote:
> >> > Hi All,
> >> >
> >> > I am looking for ways to deploy a ML Pipeline model in production .
> >> > Spark has already proved to be a one of the best framework for model
> >> > training and creation, but once the ml pipeline model is ready how
> can I
> >> > deploy it outside spark context ?
> >> > MLlib model has toPMML method but today Pipeline model can not be
> saved
> >> > to
> >> > PMML. There are some frameworks like MLeap which are trying to
> abstract
> >> > Pipeline Model and provide ML Pipeline Model deployment outside spark
> >> > context,but currently they don't have most of the ml transformers and
> >> > estimators.
> >> > I am looking for related work g

Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
Hi Nick,

Thanks a lot for the exhaustive and prompt response! (In the meantime
I watched a video about PMML to get a better understanding of the
topic).

What are the tools that could "consume" PMML exports (after running
JPMML)? What tools would be the endpoint to deliver low-latency
predictions by doing this "a bit of linear algebra and some basic
transformations"?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 1, 2016 at 6:47 PM, Nick Pentreath  wrote:
> Generally there are 2 ways to use a trained pipeline model - (offline) batch
> scoring, and real-time online scoring.
>
> For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> certainly loading the model back in Spark and feeding new data through the
> pipeline for prediction works just fine, and this is essentially what is
> supported in 1.6 (and more or less full coverage in 2.0). For large batch
> cases this can be quite efficient.
>
> However, usually for real-time use cases, the latency required is fairly low
> - of the order of a few ms to a few 100ms for a request (some examples
> include recommendations, ad-serving, fraud detection etc).
>
> In these cases, using Spark has 2 issues: (1) latency for prediction on the
> pipeline, which is based on DataFrames and therefore distributed execution,
> is usually fairly high "per request"; (2) this requires pulling in all of
> Spark for your real-time serving layer (or running a full Spark cluster),
> which is usually way too much overkill - all you really need for serving is
> a bit of linear algebra and some basic transformations.
>
> So for now, unfortunately there is not much in the way of options for
> exporting your pipelines and serving them outside of Spark - the JPMML-based
> project mentioned on this thread is one option. The other option at this
> point is to write your own export functionality and your own serving layer.
>
> There is (very initial) movement towards improving the local serving
> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
> was the "first step" in this process).
>
> On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski  wrote:
>>
>> Hi Rishabh,
>>
>> I've just today had similar conversation about how to do a ML Pipeline
>> deployment and couldn't really answer this question and more because I
>> don't really understand the use case.
>>
>> What would you expect from ML Pipeline model deployment? You can save
>> your model to a file by model.write.overwrite.save("model_v1").
>>
>> model_v1
>> |-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> `-- stages
>> |-- 0_regexTok_b4265099cc1c
>> |   `-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> |-- 1_hashingTF_8de997cf54ba
>> |   `-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-0
>> `-- 2_linReg_3942a71d2c0e
>> |-- data
>> |   |-- _SUCCESS
>> |   |-- _common_metadata
>> |   |-- _metadata
>> |   `--
>> part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>> `-- metadata
>> |-- _SUCCESS
>> `-- part-0
>>
>> 9 directories, 12 files
>>
>> What would you like to have outside SparkContext? What's wrong with
>> using Spark? Just curious hoping to understand the use case better.
>> Thanks.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
>> wrote:
>> > Hi All,
>> >
>> > I am looking for ways to deploy a ML Pipeline model in production .
>> > Spark has already proved to be a one of the best framework for model
>> > training and creation, but once the ml pipeline model is ready how can I
>> > deploy it outside spark context ?
>> > MLlib model has toPMML method but today Pipeline model can not be saved
>> > to
>> > PMML. There are some frameworks like MLeap which are trying to abstract
>> > Pipeline Model and provide ML Pipeline Model deployment outside spark
>> > context,but currently they don't have most of the ml transformers and
>> > estimators.
>> > I am looking for related work going on this area.
>> > Any pointers will be helpful.
>> >
>> > Thanks,
>> > Rishabh.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
Generally there are 2 ways to use a trained pipeline model - (offline)
batch scoring, and real-time online scoring.

For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
certainly loading the model back in Spark and feeding new data through the
pipeline for prediction works just fine, and this is essentially what is
supported in 1.6 (and more or less full coverage in 2.0). For large batch
cases this can be quite efficient.

However, usually for real-time use cases, the latency required is fairly
low - of the order of a few ms to a few 100ms for a request (some examples
include recommendations, ad-serving, fraud detection etc).

In these cases, using Spark has 2 issues: (1) latency for prediction on the
pipeline, which is based on DataFrames and therefore distributed execution,
is usually fairly high "per request"; (2) this requires pulling in all of
Spark for your real-time serving layer (or running a full Spark cluster),
which is usually way too much overkill - all you really need for serving is
a bit of linear algebra and some basic transformations.

So for now, unfortunately there is not much in the way of options for
exporting your pipelines and serving them outside of Spark - the
JPMML-based project mentioned on this thread is one option. The other
option at this point is to write your own export functionality and your own
serving layer.

There is (very initial) movement towards improving the local serving
possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
was the "first step" in this process).

On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski  wrote:

> Hi Rishabh,
>
> I've just today had similar conversation about how to do a ML Pipeline
> deployment and couldn't really answer this question and more because I
> don't really understand the use case.
>
> What would you expect from ML Pipeline model deployment? You can save
> your model to a file by model.write.overwrite.save("model_v1").
>
> model_v1
> |-- metadata
> |   |-- _SUCCESS
> |   `-- part-0
> `-- stages
> |-- 0_regexTok_b4265099cc1c
> |   `-- metadata
> |   |-- _SUCCESS
> |   `-- part-0
> |-- 1_hashingTF_8de997cf54ba
> |   `-- metadata
> |   |-- _SUCCESS
> |   `-- part-0
> `-- 2_linReg_3942a71d2c0e
> |-- data
> |   |-- _SUCCESS
> |   |-- _common_metadata
> |   |-- _metadata
> |   `--
> part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
> `-- metadata
> |-- _SUCCESS
> `-- part-0
>
> 9 directories, 12 files
>
> What would you like to have outside SparkContext? What's wrong with
> using Spark? Just curious hoping to understand the use case better.
> Thanks.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
> wrote:
> > Hi All,
> >
> > I am looking for ways to deploy a ML Pipeline model in production .
> > Spark has already proved to be a one of the best framework for model
> > training and creation, but once the ml pipeline model is ready how can I
> > deploy it outside spark context ?
> > MLlib model has toPMML method but today Pipeline model can not be saved
> to
> > PMML. There are some frameworks like MLeap which are trying to abstract
> > Pipeline Model and provide ML Pipeline Model deployment outside spark
> > context,but currently they don't have most of the ml transformers and
> > estimators.
> > I am looking for related work going on this area.
> > Any pointers will be helpful.
> >
> > Thanks,
> > Rishabh.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
Hi Rishabh,

I've just today had similar conversation about how to do a ML Pipeline
deployment and couldn't really answer this question and more because I
don't really understand the use case.

What would you expect from ML Pipeline model deployment? You can save
your model to a file by model.write.overwrite.save("model_v1").

model_v1
|-- metadata
|   |-- _SUCCESS
|   `-- part-0
`-- stages
|-- 0_regexTok_b4265099cc1c
|   `-- metadata
|   |-- _SUCCESS
|   `-- part-0
|-- 1_hashingTF_8de997cf54ba
|   `-- metadata
|   |-- _SUCCESS
|   `-- part-0
`-- 2_linReg_3942a71d2c0e
|-- data
|   |-- _SUCCESS
|   |-- _common_metadata
|   |-- _metadata
|   `-- part-r-0-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
`-- metadata
|-- _SUCCESS
`-- part-0

9 directories, 12 files

What would you like to have outside SparkContext? What's wrong with
using Spark? Just curious hoping to understand the use case better.
Thanks.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj  wrote:
> Hi All,
>
> I am looking for ways to deploy a ML Pipeline model in production .
> Spark has already proved to be a one of the best framework for model
> training and creation, but once the ml pipeline model is ready how can I
> deploy it outside spark context ?
> MLlib model has toPMML method but today Pipeline model can not be saved to
> PMML. There are some frameworks like MLeap which are trying to abstract
> Pipeline Model and provide ML Pipeline Model deployment outside spark
> context,but currently they don't have most of the ml transformers and
> estimators.
> I am looking for related work going on this area.
> Any pointers will be helpful.
>
> Thanks,
> Rishabh.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Deploying ML Pipeline Model

2016-07-01 Thread Silvio Fiorito
Hi Rishabh,

My colleague, Richard Garris from Databricks, actually just gave a talk last 
night at the Bay Area Spark Meetup on ML model deployment. The slides and 
recording should be up soon, you should be able to find a link here: 
http://www.meetup.com/spark-users/events/231574440/

Thanks,
Silvio

From: Rishabh Bhardwaj 
Date: Friday, July 1, 2016 at 7:54 AM
To: user 
Cc: "d...@spark.apache.org" 
Subject: Deploying ML Pipeline Model

Hi All,

I am looking for ways to deploy a ML Pipeline model in production .
Spark has already proved to be a one of the best framework for model training 
and creation, but once the ml pipeline model is ready how can I deploy it 
outside spark context ?
MLlib model has toPMML method but today Pipeline model can not be saved to 
PMML. There are some frameworks like MLeap which are trying to abstract 
Pipeline Model and provide ML Pipeline Model deployment outside spark 
context,but currently they don't have most of the ml transformers and 
estimators.
I am looking for related work going on this area.
Any pointers will be helpful.

Thanks,
Rishabh.


Re: Deploying ML Pipeline Model

2016-07-01 Thread Steve Goodman
Hi Rishabh,

I have a similar use-case and have struggled to find the best solution. As
I understand it 1.6 provides pipeline persistence in Scala, and that will
be expanded in 2.x. This project https://github.com/jpmml/jpmml-sparkml
claims to support about a dozen pipeline transformers, and 6 or 7 different
model types, although I have not yet used it myself.

Looking forward to hearing better suggestions?

Steve


On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj 
wrote:

> Hi All,
>
> I am looking for ways to deploy a ML Pipeline model in production .
> Spark has already proved to be a one of the best framework for model
> training and creation, but once the ml pipeline model is ready how can I
> deploy it outside spark context ?
> MLlib model has toPMML method but today Pipeline model can not be saved to
> PMML. There are some frameworks like MLeap which are trying to abstract
> Pipeline Model and provide ML Pipeline Model deployment outside spark
> context,but currently they don't have most of the ml transformers and
> estimators.
> I am looking for related work going on this area.
> Any pointers will be helpful.
>
> Thanks,
> Rishabh.
>