Thanks Sean!

SVD is the next stop.  Thanks for all the help.  Been learning a lot the past 
few days!

Chris


On Feb 19, 2011, at 9:21 AM, Sean Owen wrote:

> Yes this is the essential problem with some similarity metrics like
> Pearson correlation. In its pure form, it takes no account of the size
> of the data set on which the calculation is based. (That's why the
> framework has a crude variation which you can invoke with
> Weighting.WEIGHTED, to factor this in.)
> 
> I think your proposal perhaps goes far the other way, completely
> favoring "count". But it's not crazy or anything and probably works
> reasonably in some data sets.
> 
> There are many ways you could modify these stock algorithms to account
> for the effects you have in mind. Most of what's in the framework is
> just the basic ideas that come from canonical books and papers.
> 
> Here's another idea to play with: instead of weighting and item's
> score by average similarity to the user's preferred items, weight by
> average minus standard deviation. This tends to penalize candidate
> items that are similar to only a few of the user's items, since there
> will be only a few data points and the standard deviation larger.
> 
> Matrix factorizaton / SVD-based approaches are deeper magic -- more
> complex, more computation, much harder math, but theoretically more
> powerful. I'd see how far you can get on a basic user-user approach
> (or item-item) as a baseline and then go dig into these.
> 
> 
> On Sat, Feb 19, 2011 at 12:02 PM, Chris Schilling <[email protected]> wrote:
>> Hey Sean,
>> 
>> Thank you for the detailed reply.  Interesting points.  I think I have 
>> approached some of these points in my subsequent emails.
>> 
>> You bring up the case where all the users hate the same item.  What about 
>> the case where very few (a single?) similar users loves a place?  In that 
>> case, is this really a better  recommendation than the popular vote?  Where 
>> is the middle ground.  I think its an interesting point.   Ill see how the 
>> SVD performs.
>> 

Reply via email to