Re: Recommender's formula

Paulo Villegas Mon, 26 Nov 2012 12:20:38 -0800

The thing is, in an Item- or User- based neighborhood recommender,
there's more than one thing that can be centered :-)

What those papers talk about (from memory, it's been a while since I
last read them, and I don't have them at hand now) is about centering of
the preference around the user's (or item's) average before entering it
in the neighborhood formula. And then moving them back to its usual
range by adding back the average preference (this time for the target
item or user).

This is something that the code in Mahout does not currently do. You can
check for yourself, the formula is pretty straightforward:

https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java#L230

Now, what the Mahout code does is to center preference data when
computing user & item similarities (the ones that will later go into the
final recommender equation mentioned above). Or *can* center, since it's
an optional feature of the similarity metric. You can configure it to
apply or not, for instance it's activated for PearsonCorrelation (the
most "typical" similarity), but in general terms any similarity metric
inheriting from AbstractSimilarity can use centering. Again, check the code:

https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/AbstractSimilarity.java#L134

So, in summary, Mahouts does one of the centerings, but not the other.
What it's best depends somehow on the use case and the dataset features;
if I were to give a global opinion, I'd say when in doubt do both:
centering mostly helps, and rarely hurts. As do other kinds of
regularizations, such as Bayesian-like estimation, etc. But of course YMMV

Regards

Paulo

On 26/11/12 20:10, Evgeny Karataev wrote:

Hello,

I've read Mahout in Action book; then  this paper - "Case Study Evaluation
of Mahout as a Recommender Platform" (
http://ir.ii.uam.es/rue2012/papers/rue2012-seminario.pdf);  and then this
Sean Owen's comment (
http://mail-archives.apache.org/mod_mbox/mahout-user/201210.mbox/%3CCAEccTyzRzhRzUi9FGCPhPqa01bei=wyctx2kewocpfvu37p...@mail.gmail.com%3E)
and now I am confused what formula is used for user-based (and
item-based) recommendations. What paper is it based on?

Does it use mean centering as in the formula in Resnick's paper (
http://dl.acm.org/citation.cfm?id=192905) or formula 4.15 in "A
Comprehensive Survey of Neighborhood-based Recommendation Methods" (
http://www.springerlink.com/content/n3jq77686228781n/)? Or authors of "Case
Study Evaluation of Mahout as a Recommender Platform" are right and it
computed recommendation somehow similar to formula 4.12 in "A Comprehensive
Survey of Neighborhood-based Recommendation Methods"?


Following the algorithm in the Mahout in Action book, does not seem like i
uses mean centering. However, in the section about Cosine similarity,
authors states that the input it mean centered.


Thank you.



________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Recommender's formula

Reply via email to