> What do you mean here? You never need to actually subtract the mean
> from the data. The similarity metric's math is just adjusted to work
> as if it were. So no there is no idea of adding back a mean. I don't
> think there's something not implemented.

No, not about the similarity metric, as I said, the computation of the
similarity metric *is* centred (or can be, the code has that option).

But once you have similarities computed, then you go on and use them to
predict the rating for unknown items. It's this rating prediction the
place in which mean centering (or, to be more general, rating
normalization) is not done and could be done.

The papers mentioned in the original post explain it, I just searched
around and found another one that also mentions it:

"An Empirical Analysis of Design Choices in Neighborhood-Based
Collaborative Filtering Algorithms"

(googling it will give you a PDF right away). The rating prediction is
Equation 1, and there you can see what I mean by mean centering in the
prediction.

Basically, you use the similarities you have already computed as weights
for the averaging sum that creates the prediction, but those weights do
not multiply the bare ratings for the other items, but their deviation
from each users' average rating (equation 1 is for user-based).

The rationale is that each user's scale is different, and tends to
cluster ratings around a different mean. By subtracting that mean, we
get into the equation only the user's perceived difference between that
item and her average opinion, and factor out the user's mean opinion
(which would introduce some bias). Then we add back to the result the
average rating of the target user, which restores the normal range for
the prediction, but this time using the target user's own bias. This
helps to achieve predictions more in line with the target user's own scale.

The same paper explains it later on (more eloquently than me :-) in
section 7.1, in the more general context of rating normalization
(proposing also z-score as a more elaborate choice, and evaluating results).

Paulo

On 26/11/12 21:51, Sean Owen wrote:

On Mon, Nov 26, 2012 at 8:20 PM, Paulo Villegas <[email protected]> wrote:
The thing is, in an Item- or User- based neighborhood recommender,
there's more than one thing that can be centered :-)

What those papers talk about (from memory, it's been a while since I
last read them, and I don't have them at hand now) is about centering of
the preference around the user's (or item's) average before entering it
in the neighborhood formula. And then moving them back to its usual
range by adding back the average preference (this time for the target
item or user).

This is something that the code in Mahout does not currently do. You can
check for yourself, the formula is pretty straightforward:


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Reply via email to