Re: Velox Model Server

2015-06-20 Thread Donald Szeto
Mind if I ask what 1.3/1.4 ML features that you are looking for?

On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote:

 After getting used to Scala, writing Java is too much work :-)

 I am looking for scala based project that's using netty at its core (spray
 is one example).

 prediction.io is an option but that also looks quite complicated and not
 using all the ML features that got added in 1.3/1.4

 Velox built on top of ML / Keystone ML pipeline API and that's useful but
 it is still using javax servlets which is not netty based.

 On Sat, Jun 20, 2015 at 10:25 AM, Sandy Ryza sandy.r...@cloudera.com
 javascript:_e(%7B%7D,'cvml','sandy.r...@cloudera.com'); wrote:

 Oops, that link was for Oryx 1. Here's the repo for Oryx 2:
 https://github.com/OryxProject/oryx

 On Sat, Jun 20, 2015 at 10:20 AM, Sandy Ryza sandy.r...@cloudera.com
 javascript:_e(%7B%7D,'cvml','sandy.r...@cloudera.com'); wrote:

 Hi Debasish,

 The Oryx project (https://github.com/cloudera/oryx), which is Apache 2
 licensed, contains a model server that can serve models built with MLlib.

 -Sandy

 On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl charles.ce...@gmail.com
 javascript:_e(%7B%7D,'cvml','charles.ce...@gmail.com'); wrote:

 Is velox NOT open source?


 On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com
 javascript:_e(%7B%7D,'cvml','debasish.da...@gmail.com'); wrote:

 Hi,

 The demo of end-to-end ML pipeline including the model server
 component at Spark Summit was really cool.

 I was wondering if the Model Server component is based upon Velox or
 it uses a completely different architecture.

 https://github.com/amplab/velox-modelserver

 We are looking for an open source version of model server to build
 upon.

 Thanks.
 Deb



 --
 - Charles






-- 
Donald Szeto
PredictionIO


Re: Model deployment help

2015-03-21 Thread Donald Szeto
Hi Shashidhar,

Our team at PredictionIO is trying to solve the production deployment of
model. We built a powered-by-Spark framework (also certified on Spark by
Databricks) that allows a user to build models with everything available
from the Spark API, persist the model automatically with versioning, and
deploy as a REST service using simple CLI commands.

Regarding model degeneration and updates, if having a half to couple
seconds downtime is acceptable, with PIO one could simply run pio train
and pio deploy periodically with a cronjob. To achieve virtually zero
downtime, a load balancer could be setup in front of 2 pio deploy
instances.

Porting your current algorithm / model generation to PredictionIO should
just be a copy-and-paste procedure. We would be very grateful for any
feedback that would improve the deployment process.

We do not support PMML at the moment, but definitely are interested in your
use case.

You may get started with the documentation (http://docs.prediction.io/).
You could also visit the engine template gallery (
https://templates.prediction.io/) for quick, ready-to-use examples.
Prediction is open source software under APL2 on
https://github.com/PredictionIO/PredictionIO.

Looking forward to hearing your feedback!


Best Regards,
Donald
ᐧ

On Sat, Mar 21, 2015 at 10:40 AM, Shashidhar Rao raoshashidhar...@gmail.com
 wrote:

 Hi,

 Apologies for the generic question.

 As I am developing predictive models for the first time and soon model
 will be deployed in production very soon.

 Could somebody help me with the  model deployment in production , I have
 read quite a few on model deployment and have read some books on Database
 deployment .

 My queries relate to how  updates to model happen when current model
 degenerates without any downtime and how others are deploying in production
 servers and a few lines on adoption of PMML currently in production.

 Please provide me with some good links  or some forums  so that I can
 learn as most of the books do not cover it extensively except for 'Mahout
 in action' where it is explained in some detail and have also checked
 stackoverflow but have not got any relevant answers.

 What I understand:
 1. Build model using current training set and test the model.
 2. Deploy the model,put it in some location and load it and predict when
 request comes for scoring.
 3. Model degenerates , now build new model with new data.(Here some
 confusion , whether the old data is discarded completely or it is done with
 purely new data or a mix)
 4. Here I am stuck , how to update the model without any downtime, the
 transition period when old model and new model happens.

 My naive solution would be, build the new model , save it in a new
 location and update the new path in some properties file or update the
 location in database when the saving is done. Is this correct or some best
 practices are available.
 Database is unlikely in my case.

 Thanks in advance.






-- 
Donald Szeto
PredictionIO


Re: Spark and Play

2014-11-12 Thread Donald Szeto
Hi Akshat,

If your application is to serve results directly from a SparkContext, you
may want to take a look at http://prediction.io. It integrates Spark with
spray.io (another REST/web toolkit by Typesafe). Some heavy lifting is done
here:
https://github.com/PredictionIO/PredictionIO/blob/develop/core/src/main/scala/workflow/CreateServer.scala

Regards,
Donald
ᐧ

On Tue, Nov 11, 2014 at 11:35 PM, John Meehan jnmee...@gmail.com wrote:

 You can also build a Play 2.2.x + Spark 1.1.0 fat jar with sbt-assembly
 for, e.g. yarn-client support or using with spark-shell for debugging:

 play.Project.playScalaSettings

 libraryDependencies ~= { _ map {
   case m if m.organization == com.typesafe.play =
 m.exclude(commons-logging, commons-logging)
   case m = m
 }}

 assemblySettings

 test in assembly := {}

 mergeStrategy in assembly = (mergeStrategy in assembly) { (old) =
   {
 case m if m.toLowerCase.endsWith(manifest.mf) =
 MergeStrategy.discard
 case m if m.startsWith(META-INF) = MergeStrategy.discard
 case PathList(javax, servlet, xs @ _*) = MergeStrategy.first
 case PathList(org, apache, xs @ _*) = MergeStrategy.first
 case PathList(org, jboss, xs @ _*) = MergeStrategy.first
 case PathList(org, slf4j, xs @ _*) = MergeStrategy.discard
 case about.html  = MergeStrategy.rename
 case reference.conf = MergeStrategy.concat
 case _ = MergeStrategy.first
   }
 }

 On Tue, Nov 11, 2014 at 3:04 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

 Actually, it is possible to integrate Spark 1.1.0 with Play 2.2.x

 Here is a sample build.sbt file:

 name := xyz

 version := 0.1 

 scalaVersion := 2.10.4

 libraryDependencies ++= Seq(
   jdbc,
   anorm,
   cache,
   org.apache.spark %% spark-core % 1.1.0,
   com.typesafe.akka %% akka-actor % 2.2.3,
   com.typesafe.akka %% akka-slf4j % 2.2.3,
   org.apache.spark %% spark-sql % 1.1.0
 )

 play.Project.playScalaSettings


 Mohammed

 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Tuesday, November 11, 2014 2:06 PM
 To: Akshat Aranya
 Cc: user@spark.apache.org
 Subject: Re: Spark and Play

 Hi There,

 Because Akka versions are not binary compatible with one another, it
 might not be possible to integrate Play with Spark 1.1.0.

 - Patrick

 On Tue, Nov 11, 2014 at 8:21 AM, Akshat Aranya aara...@gmail.com wrote:
  Hi,
 
  Sorry if this has been asked before; I didn't find a satisfactory
  answer when searching.  How can I integrate a Play application with
  Spark?  I'm getting into issues of akka-actor versions.  Play 2.2.x
  uses akka-actor 2.0, whereas Play 2.3.x uses akka-actor 2.3.4, neither
  of which work fine with Spark 1.1.0.  Is there something I should do
  with libraryDependencies in my build.sbt to make it work?
 
  Thanks,
  Akshat

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
Donald Szeto
PredictionIO


Re: deploying a model built in mllib

2014-11-07 Thread Donald Szeto
Hi Chirag,

Could you please provide more information on your Java server environment?

Regards,
Donald
ᐧ

On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani chirag.lakh...@gmail.com
wrote:

 Thanks for letting me know about this, it looks pretty interesting.  From
 reading the documentation it seems that the server must be built on a Spark
 cluster, is that correct?  Is it possible to deploy it in on a Java
 server?  That is how we are currently running our web app.



 On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote:

 The latest version of PredictionIO, which is now under Apache 2 license,
 supports the deployment of MLlib models on production.

 The engine you build will including a few components, such as:
 - Data - includes Data Source and Data Preparator
 - Algorithm(s)
 - Serving
 I believe that you can do the feature vector creation inside the Data
 Preparator component.

 Currently, the package comes with two templates: 1)  Collaborative
 Filtering Engine Template - with MLlib ALS; 2) Classification Engine
 Template - with MLlib Naive Bayes. The latter one may be useful to you. And
 you can customize the Algorithm component, too.

 I have just created a doc: http://docs.prediction.io/0.8.1/templates/
 Love to hear your feedback!

 Regards,
 Simon



 On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani 
 chirag.lakh...@gmail.com wrote:

 Would pipelining include model export?  I didn't see that in the
 documentation.

 Are there ways that this is being done currently?



 On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com
 wrote:

 We are working on the pipeline features, which would make this
 procedure much easier in MLlib. This is still a WIP and the main JIRA
 is at:

 https://issues.apache.org/jira/browse/SPARK-1856

 Best,
 Xiangrui

 On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
 chirag.lakh...@gmail.com wrote:
  Hello,
 
  I have been prototyping a text classification model that my company
 would
  like to eventually put into production.  Our technology stack is
 currently
  Java based but we would like to be able to build our models in
 Spark/MLlib
  and then export something like a PMML file which can be used for model
  scoring in real-time.
 
  I have been using scikit learn where I am able to take the training
 data
  convert the text data into a sparse data format and then take the
 other
  features and use the dictionary vectorizer to do one-hot encoding for
 the
  other categorical variables.  All of those things seem to be possible
 in
  mllib but I am still puzzled about how that can be packaged in such a
 way
  that the incoming data can be first made into feature vectors and then
  evaluated as well.
 
  Are there any best practices for this type of thing in Spark?  I hope
 this
  is clear but if there are any confusions then please let me know.
 
  Thanks,
 
  Chirag







-- 
Donald Szeto
PredictionIO