Wow thanks. This is a great explanation. So when I think about writing a spark template for fraud detection (a combination of spark sql and xgboost ) and would require <1 second latency how should I store the model?
As far as I know startup of YARN jobs e.g. A spark job is too slow for that. So it would be great if the model could be evaluated without using the cluster or at least having a hot spark context similar to spark jobserver or SnappyData.io is this possible for prediction.io? Regards, Georg Pat Ferrel <[email protected]> schrieb am So. 25. Sep. 2016 um 18:19: > Gustavo it correct. To put another way both Oryx and PredictionIO are > based on what is called a Lambda Architecture. Loosely speaking this means > a potentially slow background task computes the predictive “model” but > this does not interfere with serving queries. Then when the model is ready > (stored in HDFS or Elasticsearch depending on the template) it is deployed > and the switch happens in microseconds. > > In the case of the Universal Recommender the model is stored in > Elasticsearch. During `pio train` the new model in inserted into > Elasticsearch and indexed. Once the indexing is done the index alias used > to serve queries is switched to the new index in one atomic action so there > is no downtime and any slow operation happens in the background without > impeding queries. > > The answer will vary somewhat with the template. Templates that use HDFS > for storage may need to be re-deployed but still the switch from using one > to having the new one running is microseconds. > > PMML is not relevant to this above discussion and is anyway useless for > many model types including recommenders. If you look carefully at how that > is implementing in Oryx you will see that the PMML “models” for > recommenders are not actually stored as PMML, only a minimal description of > where the real data is stored are in PMML. Remember that it has all the > problems of XML including no good way to read in parallel. > > > On Sep 25, 2016, at 7:47 AM, Gustavo Frederico < > [email protected]> wrote: > > I undestand that the querying for PredictionIO is very fast, as if it > were an Elasticsearch query. Also recall that the training moment is a > different moment that often takes a long time in most learning > systems, but as long as it's not ridiculously long, it doesn't matter > that much. > > Gustavo > > On Sun, Sep 25, 2016 at 2:30 AM, Georg Heiler <[email protected]> > wrote: > > Hi predictionIO users, > > I wonder what is the delay of an engine evaluating a model in > prediction.io. > > Are the models cached? > > > > Another project http://oryx.io/ is generating PMML which can be > evaluated > > quickly from a production application. > > > > I believe, that very often the latency until the prediction happens, is > > overlooked. How does predictionIO handle this topic? > > > > Best regards, > > Georg > >
