Just my 2 cents on the reinforcement learning and retraining, I'll likely put more things in my model that update more frequently, like stock and pre-processed product price, and these update fairly frequently ( once per day more or less ).
Gustavo On Tue, Sep 27, 2016 at 12:41 AM, Georg Heiler <[email protected]> wrote: > Hi Donald > For me it is more about stacking and meta learning. The selection of models > could be performed offline. > > But > 1 I am concerned about keeping the model up to date - retraining > 2 having some sort of reinforcement learning to improve / punish based on > correctness of new ground truth 1/month > 3 to have Very quick responses. Especially more like an evaluation of a > random forest /gbt / nnet without staring a yearn job. > > Thank you all for the feedback so far > Best regards to > Georg > Donald Szeto <[email protected]> schrieb am Di. 27. Sep. 2016 um 06:34: >> >> Sorry for side-tracking. I think Kappa architecture is a promising >> paradigm, but including batch processing from the canonical store to the >> serving layer store should still be necessary. I believe this somewhat >> hybrid Kappa-Lambda architecture would be generic enough to handle many use >> cases. If this is something that sounds good to everyone, we should drive >> PredictionIO to that direction. >> >> Georg, are you talking about updating an existing model in different ways, >> evaluate them, and select one within a time constraint, say every 1 second? >> >> On Mon, Sep 26, 2016 at 4:11 PM, Pat Ferrel <[email protected]> wrote: >>> >>> If you need the model updated in realtime you are talking about a kappa >>> architecture and PredictionIO does not support that. It does Lambda only. >>> >>> The MLlib-based recommenders use live contexts to serve from in-memory >>> copies of the ALS models but the models themselves were calculated in the >>> background. There are several scaling issues with doing this but it can be >>> done. >>> >>> On Sep 25, 2016, at 10:23 AM, Georg Heiler <[email protected]> >>> wrote: >>> >>> Wow thanks. This is a great explanation. >>> >>> So when I think about writing a spark template for fraud detection (a >>> combination of spark sql and xgboost ) and would require <1 second latency >>> how should I store the model? >>> >>> As far as I know startup of YARN jobs e.g. A spark job is too slow for >>> that. >>> So it would be great if the model could be evaluated without using the >>> cluster or at least having a hot spark context similar to spark jobserver or >>> SnappyData.io is this possible for prediction.io? >>> >>> Regards, >>> Georg >>> Pat Ferrel <[email protected]> schrieb am So. 25. Sep. 2016 um 18:19: >>>> >>>> Gustavo it correct. To put another way both Oryx and PredictionIO are >>>> based on what is called a Lambda Architecture. Loosely speaking this means >>>> a >>>> potentially slow background task computes the predictive “model” but this >>>> does not interfere with serving queries. Then when the model is ready >>>> (stored in HDFS or Elasticsearch depending on the template) it is deployed >>>> and the switch happens in microseconds. >>>> >>>> In the case of the Universal Recommender the model is stored in >>>> Elasticsearch. During `pio train` the new model in inserted into >>>> Elasticsearch and indexed. Once the indexing is done the index alias used >>>> to >>>> serve queries is switched to the new index in one atomic action so there is >>>> no downtime and any slow operation happens in the background without >>>> impeding queries. >>>> >>>> The answer will vary somewhat with the template. Templates that use HDFS >>>> for storage may need to be re-deployed but still the switch from using one >>>> to having the new one running is microseconds. >>>> >>>> PMML is not relevant to this above discussion and is anyway useless for >>>> many model types including recommenders. If you look carefully at how that >>>> is implementing in Oryx you will see that the PMML “models” for >>>> recommenders >>>> are not actually stored as PMML, only a minimal description of where the >>>> real data is stored are in PMML. Remember that it has all the problems of >>>> XML including no good way to read in parallel. >>>> >>>> >>>> On Sep 25, 2016, at 7:47 AM, Gustavo Frederico >>>> <[email protected]> wrote: >>>> >>>> I undestand that the querying for PredictionIO is very fast, as if it >>>> were an Elasticsearch query. Also recall that the training moment is a >>>> different moment that often takes a long time in most learning >>>> systems, but as long as it's not ridiculously long, it doesn't matter >>>> that much. >>>> >>>> Gustavo >>>> >>>> On Sun, Sep 25, 2016 at 2:30 AM, Georg Heiler >>>> <[email protected]> wrote: >>>> > Hi predictionIO users, >>>> > I wonder what is the delay of an engine evaluating a model in >>>> > prediction.io. >>>> > Are the models cached? >>>> > >>>> > Another project http://oryx.io/ is generating PMML which can be >>>> > evaluated >>>> > quickly from a production application. >>>> > >>>> > I believe, that very often the latency until the prediction happens, >>>> > is >>>> > overlooked. How does predictionIO handle this topic? >>>> > >>>> > Best regards, >>>> > Georg >>>> >>> >> >
