Ok Sebastian, thanks for the explanation. I'll study each similarity in more
detail.

On Sun, Nov 28, 2010 at 11:37 AM, Sebastian Schelter <[email protected]> wrote:

>  Pearson-Correlation and boolean data don't fit, all cooccurring ratings
> will have value 1 and therefore no correlation can be computed as the
> compared vectors are identical.
>
> --sebastian
>
> Am 28.11.2010 11:28, schrieb Jordi Abad:
>
> Hi,
>
> I applied the changes of MAHOUT-553 (thanks Sebastian!) against mahout-0.4.
> Everything makes sense now. I've tried it with different similarities
> (SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT,
> SIMILARITY_UNCENTERED_COSINE) and it works fine (i.e. I got good
> recommendations with different scores) but when I tried
> SIMILARITY_PEARSON_CORRELATION, I got an empty part-00000 file. Is it
> normal?
>
> On Fri, Nov 26, 2010 at 7:50 PM, Sean Owen <[email protected]> wrote:
>
>> The behavior difference is fairly simple. Instead of a weighted
>> average of preferences (which will always equal 1.0), compute some
>> other function of those weights -- for example, the average of the
>> weights.
>>
>> See GenericBooleanPrefItemBasedRecommender. It's actually just summing
>> the weights. This is nearly the same thing since the number of items
>> participating in the average is the same for all estimates. *Nearly*
>> the same since some can be NaN.
>>
>> It's an open question whether there aren't better functions of the
>> weights to use, but this is a fine start, IMHO.
>>
>>
>> On Fri, Nov 26, 2010 at 6:45 PM, Sebastian Schelter <[email protected]>
>> wrote:
>> > Hi Sean,
>> >
>> > the prediction computation for boolean data is done in
>> > AggregateAndRecommendReducer.reduceBooleanData()
>> >
>> > It computes *all* possible items to recommend for the current user and
>> > writes out only the n first after that, with n being the number
>> > specified in the parameter --numRecommendations given to RecommenderJob.
>> >
>> > Can you point me to the code where the non-distributed code handles the
>> > problem of ranking them? We could certainly emulate that behaviour in
>> > the distributed code too.
>> >
>> > --sebastian
>> >
>> >
>> >
>> > Am 26.11.2010 19:35, schrieb Sean Owen:
>> >> But is it then ranking the recommendations by the estimated pref? If
>> >> it's always 1, then the ordering is not meaningful.
>> >>
>> >> Maybe it is, I just haven't looked at your changes in much detail
>> >> since you made them although it looked broadly correct and proper.
>> >>
>> >> On Fri, Nov 26, 2010 at 6:33 PM, Sebastian Schelter <[email protected]>
>> wrote:
>> >>
>> >>> If all ratings have value 1 (cause we use boolean data) the result of
>> >>> the Predicition can also only be 1.
>> >>>
>> >
>> >
>>
>
>
>

Reply via email to