Hi Vinayak:
scipy.stats implemented pearsonr() like that because it's a statistics
routine. It treats 0 in the input data as indeed value 0.
But in the context of recommender systems, "unrated" is different from
score 0 (though we usually use 0 to represent "unrated" when score must
be positive). And strictly speaking, Pearson correlation here is
*defined* on only the items that both users have rated.
You can use routines like numpy.nonzero() to get the indices of commonly
rated items before calling pearsonr() in scipy.stats.
In my opinion, providing such new options for pearsonr() will confuse
users who are not doing recommender systems and make it deviate from its
initial definition.
Boyuan
On 03/23/2015 05:57 AM, Vinayak Mehta wrote:
@Gaƫl
> I believe that it is the same thing as cosine similarity. If that's
> indeed the case, you could add a note in the cosine similarity docstring
> to stress it.
I think it is somewhat different from cosine similarity.
@Boyuan
> I remember there is an off-the-shelf function in scipy.stats called
> pearsonr. You don't have to implement it on your own.
Yeah, I know about that. I thought of suggesting this addition after I
saw that we a newton_cg as comparted to scipy's fmin_ncg. :)
Besides, we could do things differently. For example, in my project, I
needed to calculate the mean of all the items a user has rated. I saw
pearsonr's implementation in which it is taking in account the unrated
items (which had zero rating) too for calculating this mean. We could
add an option in which the user selects this criteria. Also, I think
it doesn't consider taking into account only the ratings for an item
which both users have rated. We could add this as an option too.
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general