Sean's old Myrrix slides contain an overview of the fold-in math:
http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14?src=clipshare

I never quite got around to actually incorporating it into my own ALS-based
systems, because in the end I just re-computed models every day and found
other ways to incorporate real-time elements using Elasticsearch.

On Fri, 11 Mar 2016 at 01:12 Chris Fregly <[email protected]> wrote:

> @Colin-  you're asking the $1 million dollar question that a lot of people
> are trying to do.  This was literally the #1 most-asked question in every
> city on my recent world-wide meetup tour.
>
> I've been pointing people to my old Databricks co-worker's
> streaming-matrix-factorization project:
> https://github.com/brkyvz/streaming-matrix-factorization  He got tired of
> everyone asking about this - and cranked it out over a weekend.  Love that
> guy, Burak!  :)
>
> I've attempted (unsuccessfully, so far) to deploy exactly what you're
> trying to do here:
> https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/src/main/scala/com/advancedspark/streaming/rating/ml/TrainMFIncremental.scala
>
> We're a couple pull requests away from making this happen.  You can see my
> comments and open github issues for the remaining bits.
>
> And this will be my focus in the next week or so as I prepare for an
> upcoming conference.  Keep an eye on this repo if you'd like.
>
> @Sean:  thanks for the link.  I knew Oryx was doing this somehow - and I
> kept meaning to see how you were doing it.  I'll likely incorporate some of
> your stuff into my final solution.
>
>
> On Thu, Mar 10, 2016 at 3:35 PM, Sean Owen <[email protected]> wrote:
>
>> While it isn't crazy, I am not sure how valid it is to build a model
>> off of only a chunk of recent data and then merge it into another
>> model in any direct way. They're not really sharing a basis, so you
>> can't just average them.
>>
>> My experience with this aspect suggests you should try to update the
>> existing model in place on the fly. In short, you figure out how much
>> the new input ought to change your estimate of the (user,item)
>> association. Positive interactions should increase it a bit, etc. Then
>> you work out how the item vector would change if the user vector were
>> fixed in order to accomplish that change, with a bit of linear
>> algebra. Vice versa for user vector. Of course, those changes affect
>> the rest of the matrix too but that's the 'approximate' bit.
>>
>> I so happen to have an implementation of this in the context of a
>> Spark ALS model, though raw source code may be hard to read. If it's
>> of interest we can discuss offline (or online here to the extent it's
>> relevant to Spark users)
>>
>>
>> https://github.com/OryxProject/oryx/blob/91004a03413eef0fdfd6e75a61b68248d11db0e5/app/oryx-app/src/main/java/com/cloudera/oryx/app/speed/als/ALSSpeedModelManager.java#L192
>>
>> On Thu, Mar 10, 2016 at 8:01 PM, Colin Woodbury <[email protected]>
>> wrote:
>> > Hi there, I'm wondering if it's possible (or feasible) to combine the
>> > feature matrices of two MatrixFactorizationModels that share a user and
>> > product set.
>> >
>> > Specifically, one model would be the "on-going" model, and the other is
>> one
>> > trained only on the most recent aggregation of some event data. My
>> overall
>> > goal is to try to approximate "online" training, as ALS doesn't support
>> > streaming, and it also isn't possible to "seed" the ALS training process
>> > with an already trained model.
>> >
>> > Since the two Models would share a user/product ID space, can their
>> feature
>> > matrices be merged? For instance via:
>> >
>> > 1. Adding feature vectors together for user/product vectors that appear
>> in
>> > both models
>> > 2. Averaging said vectors instead
>> > 3. Some other linear algebra operation
>> >
>> > Unfortunately, I'm fairly ignorant as to the internal mechanics of ALS
>> > itself. Is what I'm asking possible?
>> >
>> > Thank you,
>> > Colin
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>
> --
>
> *Chris Fregly*
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc | http://advancedspark.com
>

Reply via email to