Re: Velox Model Server
Mind if I ask what 1.3/1.4 ML features that you are looking for? On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote: After getting used to Scala, writing Java is too much work :-) I am looking for scala based project that's using netty at its core (spray is one example). prediction.io is an option but that also looks quite complicated and not using all the ML features that got added in 1.3/1.4 Velox built on top of ML / Keystone ML pipeline API and that's useful but it is still using javax servlets which is not netty based. On Sat, Jun 20, 2015 at 10:25 AM, Sandy Ryza sandy.r...@cloudera.com javascript:_e(%7B%7D,'cvml','sandy.r...@cloudera.com'); wrote: Oops, that link was for Oryx 1. Here's the repo for Oryx 2: https://github.com/OryxProject/oryx On Sat, Jun 20, 2015 at 10:20 AM, Sandy Ryza sandy.r...@cloudera.com javascript:_e(%7B%7D,'cvml','sandy.r...@cloudera.com'); wrote: Hi Debasish, The Oryx project (https://github.com/cloudera/oryx), which is Apache 2 licensed, contains a model server that can serve models built with MLlib. -Sandy On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl charles.ce...@gmail.com javascript:_e(%7B%7D,'cvml','charles.ce...@gmail.com'); wrote: Is velox NOT open source? On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com javascript:_e(%7B%7D,'cvml','debasish.da...@gmail.com'); wrote: Hi, The demo of end-to-end ML pipeline including the model server component at Spark Summit was really cool. I was wondering if the Model Server component is based upon Velox or it uses a completely different architecture. https://github.com/amplab/velox-modelserver We are looking for an open source version of model server to build upon. Thanks. Deb -- - Charles -- Donald Szeto PredictionIO
Re: Model deployment help
Hi Shashidhar, Our team at PredictionIO is trying to solve the production deployment of model. We built a powered-by-Spark framework (also certified on Spark by Databricks) that allows a user to build models with everything available from the Spark API, persist the model automatically with versioning, and deploy as a REST service using simple CLI commands. Regarding model degeneration and updates, if having a half to couple seconds downtime is acceptable, with PIO one could simply run pio train and pio deploy periodically with a cronjob. To achieve virtually zero downtime, a load balancer could be setup in front of 2 pio deploy instances. Porting your current algorithm / model generation to PredictionIO should just be a copy-and-paste procedure. We would be very grateful for any feedback that would improve the deployment process. We do not support PMML at the moment, but definitely are interested in your use case. You may get started with the documentation (http://docs.prediction.io/). You could also visit the engine template gallery ( https://templates.prediction.io/) for quick, ready-to-use examples. Prediction is open source software under APL2 on https://github.com/PredictionIO/PredictionIO. Looking forward to hearing your feedback! Best Regards, Donald ᐧ On Sat, Mar 21, 2015 at 10:40 AM, Shashidhar Rao raoshashidhar...@gmail.com wrote: Hi, Apologies for the generic question. As I am developing predictive models for the first time and soon model will be deployed in production very soon. Could somebody help me with the model deployment in production , I have read quite a few on model deployment and have read some books on Database deployment . My queries relate to how updates to model happen when current model degenerates without any downtime and how others are deploying in production servers and a few lines on adoption of PMML currently in production. Please provide me with some good links or some forums so that I can learn as most of the books do not cover it extensively except for 'Mahout in action' where it is explained in some detail and have also checked stackoverflow but have not got any relevant answers. What I understand: 1. Build model using current training set and test the model. 2. Deploy the model,put it in some location and load it and predict when request comes for scoring. 3. Model degenerates , now build new model with new data.(Here some confusion , whether the old data is discarded completely or it is done with purely new data or a mix) 4. Here I am stuck , how to update the model without any downtime, the transition period when old model and new model happens. My naive solution would be, build the new model , save it in a new location and update the new path in some properties file or update the location in database when the saving is done. Is this correct or some best practices are available. Database is unlikely in my case. Thanks in advance. -- Donald Szeto PredictionIO
Re: Spark and Play
Hi Akshat, If your application is to serve results directly from a SparkContext, you may want to take a look at http://prediction.io. It integrates Spark with spray.io (another REST/web toolkit by Typesafe). Some heavy lifting is done here: https://github.com/PredictionIO/PredictionIO/blob/develop/core/src/main/scala/workflow/CreateServer.scala Regards, Donald ᐧ On Tue, Nov 11, 2014 at 11:35 PM, John Meehan jnmee...@gmail.com wrote: You can also build a Play 2.2.x + Spark 1.1.0 fat jar with sbt-assembly for, e.g. yarn-client support or using with spark-shell for debugging: play.Project.playScalaSettings libraryDependencies ~= { _ map { case m if m.organization == com.typesafe.play = m.exclude(commons-logging, commons-logging) case m = m }} assemblySettings test in assembly := {} mergeStrategy in assembly = (mergeStrategy in assembly) { (old) = { case m if m.toLowerCase.endsWith(manifest.mf) = MergeStrategy.discard case m if m.startsWith(META-INF) = MergeStrategy.discard case PathList(javax, servlet, xs @ _*) = MergeStrategy.first case PathList(org, apache, xs @ _*) = MergeStrategy.first case PathList(org, jboss, xs @ _*) = MergeStrategy.first case PathList(org, slf4j, xs @ _*) = MergeStrategy.discard case about.html = MergeStrategy.rename case reference.conf = MergeStrategy.concat case _ = MergeStrategy.first } } On Tue, Nov 11, 2014 at 3:04 PM, Mohammed Guller moham...@glassbeam.com wrote: Actually, it is possible to integrate Spark 1.1.0 with Play 2.2.x Here is a sample build.sbt file: name := xyz version := 0.1 scalaVersion := 2.10.4 libraryDependencies ++= Seq( jdbc, anorm, cache, org.apache.spark %% spark-core % 1.1.0, com.typesafe.akka %% akka-actor % 2.2.3, com.typesafe.akka %% akka-slf4j % 2.2.3, org.apache.spark %% spark-sql % 1.1.0 ) play.Project.playScalaSettings Mohammed -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, November 11, 2014 2:06 PM To: Akshat Aranya Cc: user@spark.apache.org Subject: Re: Spark and Play Hi There, Because Akka versions are not binary compatible with one another, it might not be possible to integrate Play with Spark 1.1.0. - Patrick On Tue, Nov 11, 2014 at 8:21 AM, Akshat Aranya aara...@gmail.com wrote: Hi, Sorry if this has been asked before; I didn't find a satisfactory answer when searching. How can I integrate a Play application with Spark? I'm getting into issues of akka-actor versions. Play 2.2.x uses akka-actor 2.0, whereas Play 2.3.x uses akka-actor 2.3.4, neither of which work fine with Spark 1.1.0. Is there something I should do with libraryDependencies in my build.sbt to make it work? Thanks, Akshat - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Donald Szeto PredictionIO
Re: deploying a model built in mllib
Hi Chirag, Could you please provide more information on your Java server environment? Regards, Donald ᐧ On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Thanks for letting me know about this, it looks pretty interesting. From reading the documentation it seems that the server must be built on a Spark cluster, is that correct? Is it possible to deploy it in on a Java server? That is how we are currently running our web app. On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote: The latest version of PredictionIO, which is now under Apache 2 license, supports the deployment of MLlib models on production. The engine you build will including a few components, such as: - Data - includes Data Source and Data Preparator - Algorithm(s) - Serving I believe that you can do the feature vector creation inside the Data Preparator component. Currently, the package comes with two templates: 1) Collaborative Filtering Engine Template - with MLlib ALS; 2) Classification Engine Template - with MLlib Naive Bayes. The latter one may be useful to you. And you can customize the Algorithm component, too. I have just created a doc: http://docs.prediction.io/0.8.1/templates/ Love to hear your feedback! Regards, Simon On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Would pipelining include model export? I didn't see that in the documentation. Are there ways that this is being done currently? On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote: We are working on the pipeline features, which would make this procedure much easier in MLlib. This is still a WIP and the main JIRA is at: https://issues.apache.org/jira/browse/SPARK-1856 Best, Xiangrui On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag -- Donald Szeto PredictionIO