Thanks Sean! SVD is the next stop. Thanks for all the help. Been learning a lot the past few days!
Chris On Feb 19, 2011, at 9:21 AM, Sean Owen wrote: > Yes this is the essential problem with some similarity metrics like > Pearson correlation. In its pure form, it takes no account of the size > of the data set on which the calculation is based. (That's why the > framework has a crude variation which you can invoke with > Weighting.WEIGHTED, to factor this in.) > > I think your proposal perhaps goes far the other way, completely > favoring "count". But it's not crazy or anything and probably works > reasonably in some data sets. > > There are many ways you could modify these stock algorithms to account > for the effects you have in mind. Most of what's in the framework is > just the basic ideas that come from canonical books and papers. > > Here's another idea to play with: instead of weighting and item's > score by average similarity to the user's preferred items, weight by > average minus standard deviation. This tends to penalize candidate > items that are similar to only a few of the user's items, since there > will be only a few data points and the standard deviation larger. > > Matrix factorizaton / SVD-based approaches are deeper magic -- more > complex, more computation, much harder math, but theoretically more > powerful. I'd see how far you can get on a basic user-user approach > (or item-item) as a baseline and then go dig into these. > > > On Sat, Feb 19, 2011 at 12:02 PM, Chris Schilling <[email protected]> wrote: >> Hey Sean, >> >> Thank you for the detailed reply. Interesting points. I think I have >> approached some of these points in my subsequent emails. >> >> You bring up the case where all the users hate the same item. What about >> the case where very few (a single?) similar users loves a place? In that >> case, is this really a better recommendation than the popular vote? Where >> is the middle ground. I think its an interesting point. Ill see how the >> SVD performs. >>
