The thing is, in an Item- or User- based neighborhood recommender, there's more than one thing that can be centered :-)
What those papers talk about (from memory, it's been a while since I last read them, and I don't have them at hand now) is about centering of the preference around the user's (or item's) average before entering it in the neighborhood formula. And then moving them back to its usual range by adding back the average preference (this time for the target item or user). This is something that the code in Mahout does not currently do. You can check for yourself, the formula is pretty straightforward: https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java#L230 Now, what the Mahout code does is to center preference data when computing user & item similarities (the ones that will later go into the final recommender equation mentioned above). Or *can* center, since it's an optional feature of the similarity metric. You can configure it to apply or not, for instance it's activated for PearsonCorrelation (the most "typical" similarity), but in general terms any similarity metric inheriting from AbstractSimilarity can use centering. Again, check the code: https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/AbstractSimilarity.java#L134 So, in summary, Mahouts does one of the centerings, but not the other. What it's best depends somehow on the use case and the dataset features; if I were to give a global opinion, I'd say when in doubt do both: centering mostly helps, and rarely hurts. As do other kinds of regularizations, such as Bayesian-like estimation, etc. But of course YMMV Regards Paulo On 26/11/12 20:10, Evgeny Karataev wrote:
Hello, I've read Mahout in Action book; then this paper - "Case Study Evaluation of Mahout as a Recommender Platform" ( http://ir.ii.uam.es/rue2012/papers/rue2012-seminario.pdf); and then this Sean Owen's comment ( http://mail-archives.apache.org/mod_mbox/mahout-user/201210.mbox/%3CCAEccTyzRzhRzUi9FGCPhPqa01bei=wyctx2kewocpfvu37p...@mail.gmail.com%3E) and now I am confused what formula is used for user-based (and item-based) recommendations. What paper is it based on? Does it use mean centering as in the formula in Resnick's paper ( http://dl.acm.org/citation.cfm?id=192905) or formula 4.15 in "A Comprehensive Survey of Neighborhood-based Recommendation Methods" ( http://www.springerlink.com/content/n3jq77686228781n/)? Or authors of "Case Study Evaluation of Mahout as a Recommender Platform" are right and it computed recommendation somehow similar to formula 4.12 in "A Comprehensive Survey of Neighborhood-based Recommendation Methods"? Following the algorithm in the Mahout in Action book, does not seem like i uses mean centering. However, in the section about Cosine similarity, authors states that the input it mean centered. Thank you.
________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
