This is a good discussion of the issue.

https://issues.apache.org/jira/browse/MAHOUT-898

Negative weights are problematic. I think taking the absolute value
gives slightly less explainable results, but that's up to taste. For
example a rating of 3, weighted by -4, results in a prediction of -3.
It's not clear -3 represents "the opposite of 3", and it doesn't in a
1-5 rating scale for example. Really negative weights are votes to be
infinitely far from a value, and that is weird. Don't do it.

On Mon, Nov 26, 2012 at 9:51 PM, Evgeny Karataev
<[email protected]> wrote:
> Thank you Sean and Paulo.
>
> Paulo, I guess in my original email I meant what you said in your last
> email (about rating normalization). So that part is not done.
>
> I've looked at the code https://github.com/apache/**
> mahout/blob/trunk/core/src/**main/java/org/apache/mahout/**
> cf/taste/impl/recommender/**GenericItemBasedRecommender.**java#L230<https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java#L230>
>
> and the formula looks almost exactly as formula 4.12 in "A Comprehensive
> Survey of Neighborhood-based Recommendation Methods" (
> http://www.springerlink.com/content/n3jq77686228781n/), however, the
> difference is that you divide weighted preference by totalSimilarity
>
>    ...
>
>
>  // Weights can be negative!
> preference += theSimilarity * preferencesFromUser.getValue(i);
> totalSimilarity += theSimilarity;
> ...
> float estimate = (float) (preference / totalSimilarity);
> ...
>
> Where in contrast, in other papers the denominator is sum of absolute
> values of similarities.*
> *
>
> If I am not mistaken and as the comment in the code states, weights
> (similarities) could be negative. And actually they might sum up to 0.
> Then you would divide preference by 0. What would be the estimate in
> that case?
>
>
>
>
> On Mon, Nov 26, 2012 at 4:32 PM, Paulo Villegas <[email protected]> wrote:
>
>> > What do you mean here? You never need to actually subtract the mean
>> > from the data. The similarity metric's math is just adjusted to work
>> > as if it were. So no there is no idea of adding back a mean. I don't
>> > think there's something not implemented.
>>
>> No, not about the similarity metric, as I said, the computation of the
>> similarity metric *is* centred (or can be, the code has that option).
>>
>> But once you have similarities computed, then you go on and use them to
>> predict the rating for unknown items. It's this rating prediction the
>> place in which mean centering (or, to be more general, rating
>> normalization) is not done and could be done.
>>
>> The papers mentioned in the original post explain it, I just searched
>> around and found another one that also mentions it:
>>
>> "An Empirical Analysis of Design Choices in Neighborhood-Based
>> Collaborative Filtering Algorithms"
>>
>> (googling it will give you a PDF right away). The rating prediction is
>> Equation 1, and there you can see what I mean by mean centering in the
>> prediction.
>>
>> Basically, you use the similarities you have already computed as weights
>> for the averaging sum that creates the prediction, but those weights do
>> not multiply the bare ratings for the other items, but their deviation
>> from each users' average rating (equation 1 is for user-based).
>>
>> The rationale is that each user's scale is different, and tends to
>> cluster ratings around a different mean. By subtracting that mean, we
>> get into the equation only the user's perceived difference between that
>> item and her average opinion, and factor out the user's mean opinion
>> (which would introduce some bias). Then we add back to the result the
>> average rating of the target user, which restores the normal range for
>> the prediction, but this time using the target user's own bias. This
>> helps to achieve predictions more in line with the target user's own scale.
>>
>> The same paper explains it later on (more eloquently than me :-) in
>> section 7.1, in the more general context of rating normalization
>> (proposing also z-score as a more elaborate choice, and evaluating
>> results).
>>
>> Paulo
>>
>>
>> On 26/11/12 21:51, Sean Owen wrote:
>>
>>>
>>> On Mon, Nov 26, 2012 at 8:20 PM, Paulo Villegas <[email protected]> wrote:
>>>
>>>> The thing is, in an Item- or User- based neighborhood recommender,
>>>> there's more than one thing that can be centered :-)
>>>>
>>>> What those papers talk about (from memory, it's been a while since I
>>>> last read them, and I don't have them at hand now) is about centering of
>>>> the preference around the user's (or item's) average before entering it
>>>> in the neighborhood formula. And then moving them back to its usual
>>>> range by adding back the average preference (this time for the target
>>>> item or user).
>>>>
>>>> This is something that the code in Mahout does not currently do. You can
>>>> check for yourself, the formula is pretty straightforward:
>>>>
>>>
>>
>> ______________________________**__
>>
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>> nuestra política de envío y recepción de correo electrónico en el enlace
>> situado más abajo.
>> This message is intended exclusively for its addressee. We only send and
>> receive email on the basis of the terms set out at:
>> http://www.tid.es/ES/PAGINAS/**disclaimer.aspx<http://www.tid.es/ES/PAGINAS/disclaimer.aspx>
>>
>
>
>
> --
> Best Regards,
> Evgeny Karataev

Reply via email to