By the way, I created a JIRA for supporting initial model for warm start
ALS here: https://issues.apache.org/jira/browse/SPARK-13856

On Fri, 11 Mar 2016 at 09:14, Nick Pentreath <[email protected]>
wrote:

> Sean's old Myrrix slides contain an overview of the fold-in math:
> http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14?src=clipshare
>
> I never quite got around to actually incorporating it into my own
> ALS-based systems, because in the end I just re-computed models every day
> and found other ways to incorporate real-time elements using Elasticsearch.
>
>
> On Fri, 11 Mar 2016 at 01:12 Chris Fregly <[email protected]> wrote:
>
>> @Colin-  you're asking the $1 million dollar question that a lot of
>> people are trying to do.  This was literally the #1 most-asked question in
>> every city on my recent world-wide meetup tour.
>>
>> I've been pointing people to my old Databricks co-worker's
>> streaming-matrix-factorization project:
>> https://github.com/brkyvz/streaming-matrix-factorization  He got tired
>> of everyone asking about this - and cranked it out over a weekend.  Love
>> that guy, Burak!  :)
>>
>> I've attempted (unsuccessfully, so far) to deploy exactly what you're
>> trying to do here:
>> https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/src/main/scala/com/advancedspark/streaming/rating/ml/TrainMFIncremental.scala
>>
>> We're a couple pull requests away from making this happen.  You can see
>> my comments and open github issues for the remaining bits.
>>
>> And this will be my focus in the next week or so as I prepare for an
>> upcoming conference.  Keep an eye on this repo if you'd like.
>>
>> @Sean:  thanks for the link.  I knew Oryx was doing this somehow - and I
>> kept meaning to see how you were doing it.  I'll likely incorporate some of
>> your stuff into my final solution.
>>
>>
>> On Thu, Mar 10, 2016 at 3:35 PM, Sean Owen <[email protected]> wrote:
>>
>>> While it isn't crazy, I am not sure how valid it is to build a model
>>> off of only a chunk of recent data and then merge it into another
>>> model in any direct way. They're not really sharing a basis, so you
>>> can't just average them.
>>>
>>> My experience with this aspect suggests you should try to update the
>>> existing model in place on the fly. In short, you figure out how much
>>> the new input ought to change your estimate of the (user,item)
>>> association. Positive interactions should increase it a bit, etc. Then
>>> you work out how the item vector would change if the user vector were
>>> fixed in order to accomplish that change, with a bit of linear
>>> algebra. Vice versa for user vector. Of course, those changes affect
>>> the rest of the matrix too but that's the 'approximate' bit.
>>>
>>> I so happen to have an implementation of this in the context of a
>>> Spark ALS model, though raw source code may be hard to read. If it's
>>> of interest we can discuss offline (or online here to the extent it's
>>> relevant to Spark users)
>>>
>>>
>>> https://github.com/OryxProject/oryx/blob/91004a03413eef0fdfd6e75a61b68248d11db0e5/app/oryx-app/src/main/java/com/cloudera/oryx/app/speed/als/ALSSpeedModelManager.java#L192
>>>
>>> On Thu, Mar 10, 2016 at 8:01 PM, Colin Woodbury <[email protected]>
>>> wrote:
>>> > Hi there, I'm wondering if it's possible (or feasible) to combine the
>>> > feature matrices of two MatrixFactorizationModels that share a user and
>>> > product set.
>>> >
>>> > Specifically, one model would be the "on-going" model, and the other
>>> is one
>>> > trained only on the most recent aggregation of some event data. My
>>> overall
>>> > goal is to try to approximate "online" training, as ALS doesn't support
>>> > streaming, and it also isn't possible to "seed" the ALS training
>>> process
>>> > with an already trained model.
>>> >
>>> > Since the two Models would share a user/product ID space, can their
>>> feature
>>> > matrices be merged? For instance via:
>>> >
>>> > 1. Adding feature vectors together for user/product vectors that
>>> appear in
>>> > both models
>>> > 2. Averaging said vectors instead
>>> > 3. Some other linear algebra operation
>>> >
>>> > Unfortunately, I'm fairly ignorant as to the internal mechanics of ALS
>>> > itself. Is what I'm asking possible?
>>> >
>>> > Thank you,
>>> > Colin
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>>
>> --
>>
>> *Chris Fregly*
>> Principal Data Solutions Engineer
>> IBM Spark Technology Center, San Francisco, CA
>> http://spark.tc | http://advancedspark.com
>>
>

Reply via email to