If you need to retrain is on the month level then I don’t see the need for real-time updates. PredictionIO and Lambda background model calculation seems fine.
As to expert feedback, if you are using a classifier type fraud detector, take the expert’s labeling of conditions as input to the classifier training. This is not re-enforcement learning per se, just adding training data. If you have something else in mind, I’m still not seeing the real-time need for model updates. On Sep 27, 2016, at 11:58 AM, Georg Heiler <[email protected]> wrote: @Marcin: For now, I was thinking about 3 levels of retraining: a constant stream of events is evaluated for some a quick feedback will be available within < 1month <<< regular retraining for others a feedback of > 1 <3 Months will be required <<< 1 per month retraining + special punishment / reward if the model was correct to gather feedback (called reinforcement learning) predictive improvement of model e.g. if the analyst knows that a certain condition / product will be most likely to fraud but is not yet represented in past data this should be adjustable @Pat: The current algorithm is still in a prototype stage. I am thinking about a fitting deployment option but am still unsure what is the fitting solution to move forward. Pat Ferrel <[email protected] <mailto:[email protected]>> schrieb am Di., 27. Sep. 2016 um 19:36 Uhr: 2 examples of real-time model updates: 1) Think if a MAB, it gets real time events and, in most implementations, makes real-time model updates. Yes, as Donald says, there still needs to be a store of events for various bootstrap or restart scenarios. 2) There are cases where parts of the model would benefit from real-time updates. In this case I’m thinking of the UR, but where properties are being added or changed for items already in the model. This should not require model re-training but in the current PIO does. #1 is pure kappa, #2 may be solved with a hybrid lambda/kappa. @Georg must speak for his situation :-) On Sep 27, 2016, at 10:23 AM, Marcin Ziemiński <[email protected] <mailto:[email protected]>> wrote: @Georg Could you tell more about your idea regarding reinforcement learning? What would you use it for exactly? How would you represent episodes and reward calculation/training process? Do you see it more like a model inside an engine retrained from time to time or something more, orthogonal to the current architecture? I am asking, because I was thinking about using PIO for RL and how it could be adapted for it. Best, Marcin wt., 27.09.2016 o 19:09 użytkownik Gustavo Frederico <[email protected] <mailto:[email protected]>> napisał: Just my 2 cents on the reinforcement learning and retraining, I'll likely put more things in my model that update more frequently, like stock and pre-processed product price, and these update fairly frequently ( once per day more or less ). Gustavo On Tue, Sep 27, 2016 at 12:41 AM, Georg Heiler <[email protected] <mailto:[email protected]>> wrote: > Hi Donald > For me it is more about stacking and meta learning. The selection of models > could be performed offline. > > But > 1 I am concerned about keeping the model up to date - retraining > 2 having some sort of reinforcement learning to improve / punish based on > correctness of new ground truth 1/month > 3 to have Very quick responses. Especially more like an evaluation of a > random forest /gbt / nnet without staring a yearn job. > > Thank you all for the feedback so far > Best regards to > Georg > Donald Szeto <[email protected] <mailto:[email protected]>> schrieb am Di. > 27. Sep. 2016 um 06:34: >> >> Sorry for side-tracking. I think Kappa architecture is a promising >> paradigm, but including batch processing from the canonical store to the >> serving layer store should still be necessary. I believe this somewhat >> hybrid Kappa-Lambda architecture would be generic enough to handle many use >> cases. If this is something that sounds good to everyone, we should drive >> PredictionIO to that direction. >> >> Georg, are you talking about updating an existing model in different ways, >> evaluate them, and select one within a time constraint, say every 1 second? >> >> On Mon, Sep 26, 2016 at 4:11 PM, Pat Ferrel <[email protected] >> <mailto:[email protected]>> wrote: >>> >>> If you need the model updated in realtime you are talking about a kappa >>> architecture and PredictionIO does not support that. It does Lambda only. >>> >>> The MLlib-based recommenders use live contexts to serve from in-memory >>> copies of the ALS models but the models themselves were calculated in the >>> background. There are several scaling issues with doing this but it can be >>> done. >>> >>> On Sep 25, 2016, at 10:23 AM, Georg Heiler <[email protected] >>> <mailto:[email protected]>> >>> wrote: >>> >>> Wow thanks. This is a great explanation. >>> >>> So when I think about writing a spark template for fraud detection (a >>> combination of spark sql and xgboost ) and would require <1 second latency >>> how should I store the model? >>> >>> As far as I know startup of YARN jobs e.g. A spark job is too slow for >>> that. >>> So it would be great if the model could be evaluated without using the >>> cluster or at least having a hot spark context similar to spark jobserver or >>> SnappyData.io <http://snappydata.io/> is this possible for prediction.io >>> <http://prediction.io/>? >>> >>> Regards, >>> Georg >>> Pat Ferrel <[email protected] <mailto:[email protected]>> schrieb >>> am So. 25. Sep. 2016 um 18:19: >>>> >>>> Gustavo it correct. To put another way both Oryx and PredictionIO are >>>> based on what is called a Lambda Architecture. Loosely speaking this means >>>> a >>>> potentially slow background task computes the predictive “model” but this >>>> does not interfere with serving queries. Then when the model is ready >>>> (stored in HDFS or Elasticsearch depending on the template) it is deployed >>>> and the switch happens in microseconds. >>>> >>>> In the case of the Universal Recommender the model is stored in >>>> Elasticsearch. During `pio train` the new model in inserted into >>>> Elasticsearch and indexed. Once the indexing is done the index alias used >>>> to >>>> serve queries is switched to the new index in one atomic action so there is >>>> no downtime and any slow operation happens in the background without >>>> impeding queries. >>>> >>>> The answer will vary somewhat with the template. Templates that use HDFS >>>> for storage may need to be re-deployed but still the switch from using one >>>> to having the new one running is microseconds. >>>> >>>> PMML is not relevant to this above discussion and is anyway useless for >>>> many model types including recommenders. If you look carefully at how that >>>> is implementing in Oryx you will see that the PMML “models” for >>>> recommenders >>>> are not actually stored as PMML, only a minimal description of where the >>>> real data is stored are in PMML. Remember that it has all the problems of >>>> XML including no good way to read in parallel. >>>> >>>> >>>> On Sep 25, 2016, at 7:47 AM, Gustavo Frederico >>>> <[email protected] <mailto:[email protected]>> >>>> wrote: >>>> >>>> I undestand that the querying for PredictionIO is very fast, as if it >>>> were an Elasticsearch query. Also recall that the training moment is a >>>> different moment that often takes a long time in most learning >>>> systems, but as long as it's not ridiculously long, it doesn't matter >>>> that much. >>>> >>>> Gustavo >>>> >>>> On Sun, Sep 25, 2016 at 2:30 AM, Georg Heiler >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> > Hi predictionIO users, >>>> > I wonder what is the delay of an engine evaluating a model in >>>> > prediction.io <http://prediction.io/>. >>>> > Are the models cached? >>>> > >>>> > Another project http://oryx.io/ <http://oryx.io/> is generating PMML >>>> > which can be >>>> > evaluated >>>> > quickly from a production application. >>>> > >>>> > I believe, that very often the latency until the prediction happens, >>>> > is >>>> > overlooked. How does predictionIO handle this topic? >>>> > >>>> > Best regards, >>>> > Georg >>>> >>> >> >
