Indeed, please elaborate. Not sure what you mean by "this is an important effect"
Do you disagree with what I said re temporal decay? As to downsampling or rather reweighting outliers in popular items and/or active users--It's another interesting question. Does the fact that we both like puppies and motherhood make us in any real way similar? I'm quite interested in ways to account for this. I've seen what is done to normalize ratings from different users based on whether they tend to rate high or low. I'm interested in any papers talking about the super active user or super popular items. Another subject of interest is the question; is it possible to create a blend of recommenders based on their performance on long tail items. For instance if the precision of a recommender (just considering the item-item similarity for the present) as a function of item popularity decreases towards the long tail, is it possible that one type of recommender does better than another--do the distributions cross? This would suggest a blending strategy based on how far out the long tail you are when calculating similar items. I haven't found any papers on this but perhaps there are… I think I'll experiment with it if I don't find existing research. On Feb 2, 2013, at 11:44 AM, Ted Dunning <[email protected]> wrote: Pat, This is an important effect and it strongly informs how you should down-sample heavy users as well as how you should handle temporal dynamics. On Sat, Feb 2, 2013 at 9:54 AM, Pat Ferrel <[email protected]> wrote: > RE: Temporal effects. In CF you are interested in similarities. For > instance in a User-based CF recommender you want to detect users similar to > a given user. The time decay of the similarities is likely to be very slow. > In other word if I bought an iPad 1 and you bought an iPad 1, the > similarity in our taste may live on past the introduction of the iPad 3. > > However the "recommendability" of the iPad 1 decays much quicker. I'd > suggest looking at using the recommendability decay for rescoring the > recommendations somehow. So if you get back an iPad 1 as a recommendation > you might rescore it based on the mean time that you saw it in the > preferences and decay the strength based on that. > > Maybe someone can come up with a better way to rescore but the point is > that you need to think of the two decays differently and decay preferences > in the model building step at a different rate, if al all. > > On Jan 31, 2013, at 11:57 AM, Sean Owen <[email protected]> wrote: > > It's a good question. I think you can achieve a partial solution in Mahout. > > "Real-time" suggests that you won't be able to make use of > Hadoop-based implementations, since they are by nature big batch > processes. > > All of the implementations accept the same input -- user,item,value. > That's OK; you can probably just reduce all of your user-thing > interactions to tuples like this. Any reasonable mapping should be OK. > Tags can be items too. > > I don't think any of the implementations take advantage of time. > > The non-Hadoop implementations are not-quite-realtime. The model is > loading data into memory from backing store, computing and maybe > caching partial results, and serving results as quickly as possible. > New input can't be immediately used, no. It comes into play when the > model is reloaded only. > > I think you have very sparse input -- a high number of users and > "items" (tags, likes), but relatively few interactions. Matrix > factorization / latent factor models work well here. The ones in > Mahout that are not Hadoop-based may work for you, like > SVDRecommender. It's worth a try. > > (Advertisement: the new recommender product I am commercializing, > Myrrix, does the real-time and matrix factorization thing just fine. > It's easy enough to start with that I would encourage you to > experiment with the open source system also: > http://myrrix.com/download/) > > > > On Thu, Jan 31, 2013 at 7:02 PM, Frederik Kraus > <[email protected]> wrote: >> Hi Guys, >> >> I'm rather new to the whole Mahout ecosystem, so please excuse if the > questions I have are rather dumb ;) >> >> Our "problem" basically boils down to this: we want to match users with > either the content they interested in and/or the content they could > contribute to. To do this "matching" we have several dimensions both of > users and content items (things like: contribution history, tags, browsing > history, diggs, likes, ….). >> >> As interest of users can change over time some kind of CF algorithm > including temporal effects would obviously be best, but for the time being > those effects could probably be neglected. >> >> Now my questions: >> >> - what algorithm from the mahout "toolkit" would best fit our case? >> - How can we get this near realtime, i.e. not having to recalculate the > entire model when user dimensions change and/or new content is being added > to the system (or updated) >> - how would we model the user and item vectors (especially things like > "tags")? >> - any hints on where to start? ;) >> >> Thanks a lot! >> >> Fred. >> > >
