Note that if you do implement mean centering, then it solves that
interpretation issue. Then a prediction of -3 means "a prediction 3
below the user's mean", so it's still valid in the 1-5 scale (when you
add the user's mean, it gets back into the scale, though it's possible
that needs capping).

But you're right in that implementing it requires to carry around rating
means. I did that, augmenting the DataModel with the needed data
(basically a bunch of RunningAverage objects), but the result wasn't
pretty :-), so I did not submit it as a patch.



This is a good discussion of the issue.

https://issues.apache.org/jira/browse/MAHOUT-898

Negative weights are problematic. I think taking the absolute value
gives slightly less explainable results, but that's up to taste. For
example a rating of 3, weighted by -4, results in a prediction of -3.
It's not clear -3 represents "the opposite of 3", and it doesn't in a
1-5 rating scale for example. Really negative weights are votes to be
infinitely far from a value, and that is weird. Don't do it.

On Mon, Nov 26, 2012 at 9:51 PM, Evgeny Karataev
<[email protected]> wrote:
Thank you Sean and Paulo.

Paulo, I guess in my original email I meant what you said in your last
email (about rating normalization). So that part is not done.

I've looked at the code https://github.com/apache/**
mahout/blob/trunk/core/src/**main/java/org/apache/mahout/**
cf/taste/impl/recommender/**GenericItemBasedRecommender.**java#L230<https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java#L230>

and the formula looks almost exactly as formula 4.12 in "A Comprehensive
Survey of Neighborhood-based Recommendation Methods" (
http://www.springerlink.com/content/n3jq77686228781n/), however, the
difference is that you divide weighted preference by totalSimilarity

    ...


  // Weights can be negative!
preference += theSimilarity * preferencesFromUser.getValue(i);
totalSimilarity += theSimilarity;
...
float estimate = (float) (preference / totalSimilarity);
...

Where in contrast, in other papers the denominator is sum of absolute
values of similarities.*
*

If I am not mistaken and as the comment in the code states, weights
(similarities) could be negative. And actually they might sum up to 0.
Then you would divide preference by 0. What would be the estimate in
that case?




On Mon, Nov 26, 2012 at 4:32 PM, Paulo Villegas <[email protected]> wrote:

What do you mean here? You never need to actually subtract the mean
from the data. The similarity metric's math is just adjusted to work
as if it were. So no there is no idea of adding back a mean. I don't
think there's something not implemented.

No, not about the similarity metric, as I said, the computation of the
similarity metric *is* centred (or can be, the code has that option).

But once you have similarities computed, then you go on and use them to
predict the rating for unknown items. It's this rating prediction the
place in which mean centering (or, to be more general, rating
normalization) is not done and could be done.

The papers mentioned in the original post explain it, I just searched
around and found another one that also mentions it:

"An Empirical Analysis of Design Choices in Neighborhood-Based
Collaborative Filtering Algorithms"

(googling it will give you a PDF right away). The rating prediction is
Equation 1, and there you can see what I mean by mean centering in the
prediction.

Basically, you use the similarities you have already computed as weights
for the averaging sum that creates the prediction, but those weights do
not multiply the bare ratings for the other items, but their deviation
from each users' average rating (equation 1 is for user-based).

The rationale is that each user's scale is different, and tends to
cluster ratings around a different mean. By subtracting that mean, we
get into the equation only the user's perceived difference between that
item and her average opinion, and factor out the user's mean opinion
(which would introduce some bias). Then we add back to the result the
average rating of the target user, which restores the normal range for
the prediction, but this time using the target user's own bias. This
helps to achieve predictions more in line with the target user's own scale.

The same paper explains it later on (more eloquently than me :-) in
section 7.1, in the more general context of rating normalization
(proposing also z-score as a more elaborate choice, and evaluating
results).

Paulo


On 26/11/12 21:51, Sean Owen wrote:


On Mon, Nov 26, 2012 at 8:20 PM, Paulo Villegas <[email protected]> wrote:

The thing is, in an Item- or User- based neighborhood recommender,
there's more than one thing that can be centered :-)

What those papers talk about (from memory, it's been a while since I
last read them, and I don't have them at hand now) is about centering of
the preference around the user's (or item's) average before entering it
in the neighborhood formula. And then moving them back to its usual
range by adding back the average preference (this time for the target
item or user).

This is something that the code in Mahout does not currently do. You can
check for yourself, the formula is pretty straightforward:



______________________________**__

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
nuestra política de envío y recepción de correo electrónico en el enlace
situado más abajo.
This message is intended exclusively for its addressee. We only send and
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/**disclaimer.aspx<http://www.tid.es/ES/PAGINAS/disclaimer.aspx>




--
Best Regards,
Evgeny Karataev


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Reply via email to