The Pearson correlation is not a distance metric, it goes the wrong
way. Distance needs to be 0 when two things are identical instead of
1. "1 - correlation" can work. I am not sure that is a proper distance
metric but you often don't need it to be proper and obey the triangle
inequality.

A point in space defines a vector / direction. You can project all
your data onto that vector and then you have put everything on a 1-D
axis. It sounds like you want to start with one of those points like
"the centroid of all Star Trek movies". You can do this by dotting
everyone's item pref vector with the vector that has a "1" for all
Star Trek movies only. This will tell you, well, the extent to which
people have rated the Star Trek movies.

It probably gets more interesting when you project the data
intelligently into a lower-dimensional space. There you might pick up
that people who rate Star Wars end up projecting near to the Star Trek
watchers. The SVD (more or less what PCA is) is good, even overkill if
this is all you want; ALS is fast and simple for a smart-ish
projection into lower dimensions.

The things that project far from Star Trek may, or may not, be
thematically coherent. There are many "things not like Star Trek". For
example, if you operated in the original space, you'd find a huge
clump of people at 0 -- these are people that have just never watched
a Star Trek movie, and there are plenty of different types of those.
But, in some cases, you may find an interesting thematic clump at the
opposite end sure.


So far there's nothing here that requires even more than one axis or
dimension. It sounds like you want to discover clusters, which is just
clustering and not something that necessarily involves any projection
or matrix factorization. Then see what things look like when you
project onto the axis defined by that cluster in feature space. Just
cluster with k-means to start and use the distance metric or similar
above.





On Tue, Nov 27, 2012 at 1:55 AM, Lance Norskog <[email protected]> wrote:
> There is a problem I've wanted to solve for a long time. Suppose you want to 
> find antipodes in preferences: "axes of interest". In movie preferences, Star 
> Trek movies (male nerds) v.s. Sex In The City (middle class women) might be 
> one axis v.s. historical documentaries v.s. 1950's Douglas Sirk melodramas 
> (don't ask). These axes are not orthogonal. (I saw this analysis in a 
> presentation by one of the Netflix Competition finalists. Unfortunately, I 
> did not ask him how to make it.)
>
> Thank you for this hint. Negative correlations make this possible. Given an 
> item-item matrix of Pearson distances, how would you isolate these axes? The 
> minimum and maximum movies are easy to find. Each axis endpoint is a small 
> cluster inside a genre. How would you find these small clusters? They're not 
> orthogonal, so a naive SVD would not help. What is a good algorithm for this?
>
> Lance
>
> ----- Original Message -----
> | From: "Paulo Villegas" <[email protected]>
> | To: [email protected]
> | Sent: Monday, November 26, 2012 2:03:59 PM
> | Subject: Re: Recommender's formula
> |
> | [...]
> |
> | They can be negative for certain similarity metrics, most notably
> | Pearson (which has sign, negative similarities express negative
> | correlations), other similarity metrics are strictly positive and
> | therefore do not present that problem.
> |
> | [...]

Reply via email to