Re: (near) real time recommender/predictor

Ted Dunning Sat, 02 Feb 2013 11:45:34 -0800

Pat,

This is an important effect and it strongly informs how you should
down-sample heavy users as well as how you should handle temporal dynamics.


On Sat, Feb 2, 2013 at 9:54 AM, Pat Ferrel <[email protected]> wrote:

> RE: Temporal effects. In CF you are interested in similarities. For
> instance in a User-based CF recommender you want to detect users similar to
> a given user. The time decay of the similarities is likely to be very slow.
> In other word if I bought an iPad 1 and you bought an iPad 1, the
> similarity in our taste may live on past the introduction of the iPad 3.
>
> However the "recommendability" of the iPad 1 decays much quicker. I'd
> suggest looking at using the recommendability decay for rescoring the
> recommendations somehow. So if you get back an iPad 1 as a recommendation
> you might rescore it based on the mean time that you saw it in the
> preferences and decay the strength based on that.
>
> Maybe someone can come up with a better way to rescore but the point is
> that you need to think of the two decays differently and decay preferences
> in the model building step at a different rate, if al all.
>
> On Jan 31, 2013, at 11:57 AM, Sean Owen <[email protected]> wrote:
>
> It's a good question. I think you can achieve a partial solution in Mahout.
>
> "Real-time" suggests that you won't be able to make use of
> Hadoop-based implementations, since they are by nature big batch
> processes.
>
> All of the implementations accept the same input -- user,item,value.
> That's OK; you can probably just reduce all of your user-thing
> interactions to tuples like this. Any reasonable mapping should be OK.
> Tags can be items too.
>
> I don't think any of the implementations take advantage of time.
>
> The non-Hadoop implementations are not-quite-realtime. The model is
> loading data into memory from backing store, computing and maybe
> caching partial results, and serving results as quickly as possible.
> New input can't be immediately used, no. It comes into play when the
> model is reloaded only.
>
> I think you have very sparse input -- a high number of users and
> "items" (tags, likes), but relatively few interactions. Matrix
> factorization / latent factor models work well here. The ones in
> Mahout that are not Hadoop-based may work for you, like
> SVDRecommender. It's worth a try.
>
> (Advertisement: the new recommender product I am commercializing,
> Myrrix, does the real-time and matrix factorization thing just fine.
> It's easy enough to start with that I would encourage you to
> experiment with the open source system also:
> http://myrrix.com/download/)
>
>
>
> On Thu, Jan 31, 2013 at 7:02 PM, Frederik Kraus
> <[email protected]> wrote:
> > Hi Guys,
> >
> > I'm rather new to the whole Mahout ecosystem, so please excuse if the
> questions I have are rather dumb ;)
> >
> > Our "problem" basically boils down to this: we want to match users with
> either the content they interested in and/or the content they could
> contribute to. To do this "matching" we have several dimensions both of
> users and content items (things like: contribution history, tags, browsing
> history, diggs, likes, ….).
> >
> > As interest of users can change over time some kind of CF algorithm
> including temporal effects would obviously be best, but for the time being
> those effects could probably be neglected.
> >
> > Now my questions:
> >
> > - what algorithm from the mahout "toolkit" would best fit our case?
> > - How can we get this near realtime, i.e. not having to recalculate the
> entire model when user dimensions change and/or new content is being added
> to the system (or updated)
> > - how would we model the user and item vectors (especially things like
> "tags")?
> > - any hints on where to start? ;)
> >
> > Thanks a lot!
> >
> > Fred.
> >
>
>

Re: (near) real time recommender/predictor

Reply via email to