By the way, I created a JIRA for supporting initial model for warm start ALS here: https://issues.apache.org/jira/browse/SPARK-13856
On Fri, 11 Mar 2016 at 09:14, Nick Pentreath <[email protected]> wrote: > Sean's old Myrrix slides contain an overview of the fold-in math: > http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14?src=clipshare > > I never quite got around to actually incorporating it into my own > ALS-based systems, because in the end I just re-computed models every day > and found other ways to incorporate real-time elements using Elasticsearch. > > > On Fri, 11 Mar 2016 at 01:12 Chris Fregly <[email protected]> wrote: > >> @Colin- you're asking the $1 million dollar question that a lot of >> people are trying to do. This was literally the #1 most-asked question in >> every city on my recent world-wide meetup tour. >> >> I've been pointing people to my old Databricks co-worker's >> streaming-matrix-factorization project: >> https://github.com/brkyvz/streaming-matrix-factorization He got tired >> of everyone asking about this - and cranked it out over a weekend. Love >> that guy, Burak! :) >> >> I've attempted (unsuccessfully, so far) to deploy exactly what you're >> trying to do here: >> https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/src/main/scala/com/advancedspark/streaming/rating/ml/TrainMFIncremental.scala >> >> We're a couple pull requests away from making this happen. You can see >> my comments and open github issues for the remaining bits. >> >> And this will be my focus in the next week or so as I prepare for an >> upcoming conference. Keep an eye on this repo if you'd like. >> >> @Sean: thanks for the link. I knew Oryx was doing this somehow - and I >> kept meaning to see how you were doing it. I'll likely incorporate some of >> your stuff into my final solution. >> >> >> On Thu, Mar 10, 2016 at 3:35 PM, Sean Owen <[email protected]> wrote: >> >>> While it isn't crazy, I am not sure how valid it is to build a model >>> off of only a chunk of recent data and then merge it into another >>> model in any direct way. They're not really sharing a basis, so you >>> can't just average them. >>> >>> My experience with this aspect suggests you should try to update the >>> existing model in place on the fly. In short, you figure out how much >>> the new input ought to change your estimate of the (user,item) >>> association. Positive interactions should increase it a bit, etc. Then >>> you work out how the item vector would change if the user vector were >>> fixed in order to accomplish that change, with a bit of linear >>> algebra. Vice versa for user vector. Of course, those changes affect >>> the rest of the matrix too but that's the 'approximate' bit. >>> >>> I so happen to have an implementation of this in the context of a >>> Spark ALS model, though raw source code may be hard to read. If it's >>> of interest we can discuss offline (or online here to the extent it's >>> relevant to Spark users) >>> >>> >>> https://github.com/OryxProject/oryx/blob/91004a03413eef0fdfd6e75a61b68248d11db0e5/app/oryx-app/src/main/java/com/cloudera/oryx/app/speed/als/ALSSpeedModelManager.java#L192 >>> >>> On Thu, Mar 10, 2016 at 8:01 PM, Colin Woodbury <[email protected]> >>> wrote: >>> > Hi there, I'm wondering if it's possible (or feasible) to combine the >>> > feature matrices of two MatrixFactorizationModels that share a user and >>> > product set. >>> > >>> > Specifically, one model would be the "on-going" model, and the other >>> is one >>> > trained only on the most recent aggregation of some event data. My >>> overall >>> > goal is to try to approximate "online" training, as ALS doesn't support >>> > streaming, and it also isn't possible to "seed" the ALS training >>> process >>> > with an already trained model. >>> > >>> > Since the two Models would share a user/product ID space, can their >>> feature >>> > matrices be merged? For instance via: >>> > >>> > 1. Adding feature vectors together for user/product vectors that >>> appear in >>> > both models >>> > 2. Averaging said vectors instead >>> > 3. Some other linear algebra operation >>> > >>> > Unfortunately, I'm fairly ignorant as to the internal mechanics of ALS >>> > itself. Is what I'm asking possible? >>> > >>> > Thank you, >>> > Colin >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> >> -- >> >> *Chris Fregly* >> Principal Data Solutions Engineer >> IBM Spark Technology Center, San Francisco, CA >> http://spark.tc | http://advancedspark.com >> >
