@Georg Could you tell more about your idea regarding reinforcement learning? What would you use it for exactly? How would you represent episodes and reward calculation/training process? Do you see it more like a model inside an engine retrained from time to time or something more, orthogonal to the current architecture? I am asking, because I was thinking about using PIO for RL and how it could be adapted for it.
Best, Marcin wt., 27.09.2016 o 19:09 użytkownik Gustavo Frederico < [email protected]> napisał: > Just my 2 cents on the reinforcement learning and retraining, I'll > likely put more things in my model that update more frequently, like > stock and pre-processed product price, and these update fairly > frequently ( once per day more or less ). > > Gustavo > > On Tue, Sep 27, 2016 at 12:41 AM, Georg Heiler > <[email protected]> wrote: > > Hi Donald > > For me it is more about stacking and meta learning. The selection of > models > > could be performed offline. > > > > But > > 1 I am concerned about keeping the model up to date - retraining > > 2 having some sort of reinforcement learning to improve / punish based on > > correctness of new ground truth 1/month > > 3 to have Very quick responses. Especially more like an evaluation of a > > random forest /gbt / nnet without staring a yearn job. > > > > Thank you all for the feedback so far > > Best regards to > > Georg > > Donald Szeto <[email protected]> schrieb am Di. 27. Sep. 2016 um 06:34: > >> > >> Sorry for side-tracking. I think Kappa architecture is a promising > >> paradigm, but including batch processing from the canonical store to the > >> serving layer store should still be necessary. I believe this somewhat > >> hybrid Kappa-Lambda architecture would be generic enough to handle many > use > >> cases. If this is something that sounds good to everyone, we should > drive > >> PredictionIO to that direction. > >> > >> Georg, are you talking about updating an existing model in different > ways, > >> evaluate them, and select one within a time constraint, say every 1 > second? > >> > >> On Mon, Sep 26, 2016 at 4:11 PM, Pat Ferrel <[email protected]> > wrote: > >>> > >>> If you need the model updated in realtime you are talking about a kappa > >>> architecture and PredictionIO does not support that. It does Lambda > only. > >>> > >>> The MLlib-based recommenders use live contexts to serve from in-memory > >>> copies of the ALS models but the models themselves were calculated in > the > >>> background. There are several scaling issues with doing this but it > can be > >>> done. > >>> > >>> On Sep 25, 2016, at 10:23 AM, Georg Heiler <[email protected]> > >>> wrote: > >>> > >>> Wow thanks. This is a great explanation. > >>> > >>> So when I think about writing a spark template for fraud detection (a > >>> combination of spark sql and xgboost ) and would require <1 second > latency > >>> how should I store the model? > >>> > >>> As far as I know startup of YARN jobs e.g. A spark job is too slow for > >>> that. > >>> So it would be great if the model could be evaluated without using the > >>> cluster or at least having a hot spark context similar to spark > jobserver or > >>> SnappyData.io is this possible for prediction.io? > >>> > >>> Regards, > >>> Georg > >>> Pat Ferrel <[email protected]> schrieb am So. 25. Sep. 2016 um > 18:19: > >>>> > >>>> Gustavo it correct. To put another way both Oryx and PredictionIO are > >>>> based on what is called a Lambda Architecture. Loosely speaking this > means a > >>>> potentially slow background task computes the predictive “model” but > this > >>>> does not interfere with serving queries. Then when the model is ready > >>>> (stored in HDFS or Elasticsearch depending on the template) it is > deployed > >>>> and the switch happens in microseconds. > >>>> > >>>> In the case of the Universal Recommender the model is stored in > >>>> Elasticsearch. During `pio train` the new model in inserted into > >>>> Elasticsearch and indexed. Once the indexing is done the index alias > used to > >>>> serve queries is switched to the new index in one atomic action so > there is > >>>> no downtime and any slow operation happens in the background without > >>>> impeding queries. > >>>> > >>>> The answer will vary somewhat with the template. Templates that use > HDFS > >>>> for storage may need to be re-deployed but still the switch from > using one > >>>> to having the new one running is microseconds. > >>>> > >>>> PMML is not relevant to this above discussion and is anyway useless > for > >>>> many model types including recommenders. If you look carefully at how > that > >>>> is implementing in Oryx you will see that the PMML “models” for > recommenders > >>>> are not actually stored as PMML, only a minimal description of where > the > >>>> real data is stored are in PMML. Remember that it has all the > problems of > >>>> XML including no good way to read in parallel. > >>>> > >>>> > >>>> On Sep 25, 2016, at 7:47 AM, Gustavo Frederico > >>>> <[email protected]> wrote: > >>>> > >>>> I undestand that the querying for PredictionIO is very fast, as if it > >>>> were an Elasticsearch query. Also recall that the training moment is a > >>>> different moment that often takes a long time in most learning > >>>> systems, but as long as it's not ridiculously long, it doesn't matter > >>>> that much. > >>>> > >>>> Gustavo > >>>> > >>>> On Sun, Sep 25, 2016 at 2:30 AM, Georg Heiler > >>>> <[email protected]> wrote: > >>>> > Hi predictionIO users, > >>>> > I wonder what is the delay of an engine evaluating a model in > >>>> > prediction.io. > >>>> > Are the models cached? > >>>> > > >>>> > Another project http://oryx.io/ is generating PMML which can be > >>>> > evaluated > >>>> > quickly from a production application. > >>>> > > >>>> > I believe, that very often the latency until the prediction happens, > >>>> > is > >>>> > overlooked. How does predictionIO handle this topic? > >>>> > > >>>> > Best regards, > >>>> > Georg > >>>> > >>> > >> > > >
