+1

Andy

From:  darren <dar...@ontrenet.com>
Date:  Thursday, November 12, 2015 at 12:34 PM
To:  "Kothuvatiparambil, Viju" <viju.kothuvatiparam...@bankofamerica.com>,
DB Tsai <dbt...@dbtsai.com>, Sean Owen <so...@cloudera.com>
Cc:  Felix Cheung <felixcheun...@hotmail.com>, Nirmal Fernando
<nir...@wso2.com>, Andrew Davidson <a...@santacruzintegration.com>, Adrian
Tanase <atan...@adobe.com>, "user @spark" <user@spark.apache.org>, Xiangrui
Meng <men...@gmail.com>, "hol...@pigscanfly.ca" <hol...@pigscanfly.ca>
Subject:  RE: thought experiment: use spark ML to real time prediction

>  
> I agree 100%. Making the model requires large data and many cpus.
> 
> Using it does not.
> 
> This is a very useful side effect of ML models.
> 
> If mlib can't use models outside spark that's a real shame.
> 
> 
> Sent from my Verizon Wireless 4G LTE smartphone
> 
> 
> -------- Original message --------
> From: "Kothuvatiparambil, Viju" <viju.kothuvatiparam...@bankofamerica.com>
> Date: 11/12/2015  3:09 PM  (GMT-05:00)
> To: DB Tsai <dbt...@dbtsai.com>, Sean Owen <so...@cloudera.com>
> Cc: Felix Cheung <felixcheun...@hotmail.com>, Nirmal Fernando
> <nir...@wso2.com>, Andy Davidson <a...@santacruzintegration.com>, Adrian
> Tanase <atan...@adobe.com>, "user @spark" <user@spark.apache.org>, Xiangrui
> Meng <men...@gmail.com>, hol...@pigscanfly.ca
> Subject: RE: thought experiment: use spark ML to real time prediction
> 
> I am glad to see DB¹s comments, make me feel I am not the only one facing
> these issues. If we are able to use MLLib to load the model in web
> applications (outside the spark cluster), that would have solved the issue.  I
> understand Spark is manly for processing big data in a distributed mode. But,
> there is no purpose in training a model using MLLib, if we are not able to use
> it in applications where needs to access the model.
>  
> Thanks
> Viju
>  
> From: DB Tsai [mailto:dbt...@dbtsai.com]
> Sent: Thursday, November 12, 2015 11:04 AM
> To: Sean Owen
> Cc: Felix Cheung; Nirmal Fernando; Andy Davidson; Adrian Tanase; user @spark;
> Xiangrui Meng; hol...@pigscanfly.ca
> Subject: Re: thought experiment: use spark ML to real time prediction
>  
> 
> I think the use-case can be quick different from PMML.
> 
>  
> 
> By having a Spark platform independent ML jar, this can empower users to do
> the following,
> 
>  
> 
> 1) PMML doesn't contain all the models we have in mllib. Also, for a ML
> pipeline trained by Spark, most of time, PMML is not expressive enough to do
> all the transformation we have in Spark ML. As a result, if we are able to
> serialize the entire Spark ML pipeline after training, and then load them back
> in app without any Spark platform for production scorning, this will be very
> useful for production deployment of Spark ML models. The only issue will be if
> the transformer involves with shuffle, we need to figure out a way to handle
> it. When I chatted with Xiangrui about this, he suggested that we may tag if a
> transformer is shuffle ready. Currently, at Netflix, we are not able to use ML
> pipeline because of those issues, and we have to write our own scorers in our
> production which is quite a duplicated work.
> 
>  
> 
> 2) If users can use Spark's linear algebra like vector or matrix code in their
> application, this will be very useful. This can help to share code in Spark
> training pipeline and production deployment. Also, lots of good stuff at
> Spark's mllib doesn't depend on Spark platform, and people can use them in
> their application without pulling lots of dependencies. In fact, in my
> project, I have to copy & paste code from mllib into my project to use those
> goodies in apps.
> 
>  
> 
> 3) Currently, mllib depends on graphx which means in graphx, there is no way
> to use mllib's vector or matrix. And


Reply via email to