Pat, This is an important effect and it strongly informs how you should down-sample heavy users as well as how you should handle temporal dynamics.
On Sat, Feb 2, 2013 at 9:54 AM, Pat Ferrel <[email protected]> wrote: > RE: Temporal effects. In CF you are interested in similarities. For > instance in a User-based CF recommender you want to detect users similar to > a given user. The time decay of the similarities is likely to be very slow. > In other word if I bought an iPad 1 and you bought an iPad 1, the > similarity in our taste may live on past the introduction of the iPad 3. > > However the "recommendability" of the iPad 1 decays much quicker. I'd > suggest looking at using the recommendability decay for rescoring the > recommendations somehow. So if you get back an iPad 1 as a recommendation > you might rescore it based on the mean time that you saw it in the > preferences and decay the strength based on that. > > Maybe someone can come up with a better way to rescore but the point is > that you need to think of the two decays differently and decay preferences > in the model building step at a different rate, if al all. > > On Jan 31, 2013, at 11:57 AM, Sean Owen <[email protected]> wrote: > > It's a good question. I think you can achieve a partial solution in Mahout. > > "Real-time" suggests that you won't be able to make use of > Hadoop-based implementations, since they are by nature big batch > processes. > > All of the implementations accept the same input -- user,item,value. > That's OK; you can probably just reduce all of your user-thing > interactions to tuples like this. Any reasonable mapping should be OK. > Tags can be items too. > > I don't think any of the implementations take advantage of time. > > The non-Hadoop implementations are not-quite-realtime. The model is > loading data into memory from backing store, computing and maybe > caching partial results, and serving results as quickly as possible. > New input can't be immediately used, no. It comes into play when the > model is reloaded only. > > I think you have very sparse input -- a high number of users and > "items" (tags, likes), but relatively few interactions. Matrix > factorization / latent factor models work well here. The ones in > Mahout that are not Hadoop-based may work for you, like > SVDRecommender. It's worth a try. > > (Advertisement: the new recommender product I am commercializing, > Myrrix, does the real-time and matrix factorization thing just fine. > It's easy enough to start with that I would encourage you to > experiment with the open source system also: > http://myrrix.com/download/) > > > > On Thu, Jan 31, 2013 at 7:02 PM, Frederik Kraus > <[email protected]> wrote: > > Hi Guys, > > > > I'm rather new to the whole Mahout ecosystem, so please excuse if the > questions I have are rather dumb ;) > > > > Our "problem" basically boils down to this: we want to match users with > either the content they interested in and/or the content they could > contribute to. To do this "matching" we have several dimensions both of > users and content items (things like: contribution history, tags, browsing > history, diggs, likes, ….). > > > > As interest of users can change over time some kind of CF algorithm > including temporal effects would obviously be best, but for the time being > those effects could probably be neglected. > > > > Now my questions: > > > > - what algorithm from the mahout "toolkit" would best fit our case? > > - How can we get this near realtime, i.e. not having to recalculate the > entire model when user dimensions change and/or new content is being added > to the system (or updated) > > - how would we model the user and item vectors (especially things like > "tags")? > > - any hints on where to start? ;) > > > > Thanks a lot! > > > > Fred. > > > >
