Pearson-Correlation and boolean data don't fit, all cooccurring ratings will have value 1 and therefore no correlation can be computed as the compared vectors are identical.
--sebastian Am 28.11.2010 11:28, schrieb Jordi Abad: > Hi, > > I applied the changes of MAHOUT-553 (thanks Sebastian!) against > mahout-0.4. Everything makes sense now. I've tried it with different > similarities (SIMILARITY_LOGLIKELIHOOD, > SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_UNCENTERED_COSINE) and it > works fine (i.e. I got good recommendations with different scores) but > when I tried SIMILARITY_PEARSON_CORRELATION, I got an empty part-00000 > file. Is it normal? > > On Fri, Nov 26, 2010 at 7:50 PM, Sean Owen <[email protected] > <mailto:[email protected]>> wrote: > > The behavior difference is fairly simple. Instead of a weighted > average of preferences (which will always equal 1.0), compute some > other function of those weights -- for example, the average of the > weights. > > See GenericBooleanPrefItemBasedRecommender. It's actually just summing > the weights. This is nearly the same thing since the number of items > participating in the average is the same for all estimates. *Nearly* > the same since some can be NaN. > > It's an open question whether there aren't better functions of the > weights to use, but this is a fine start, IMHO. > > > On Fri, Nov 26, 2010 at 6:45 PM, Sebastian Schelter > <[email protected] <mailto:[email protected]>> wrote: > > Hi Sean, > > > > the prediction computation for boolean data is done in > > AggregateAndRecommendReducer.reduceBooleanData() > > > > It computes *all* possible items to recommend for the current > user and > > writes out only the n first after that, with n being the number > > specified in the parameter --numRecommendations given to > RecommenderJob. > > > > Can you point me to the code where the non-distributed code > handles the > > problem of ranking them? We could certainly emulate that > behaviour in > > the distributed code too. > > > > --sebastian > > > > > > > > Am 26.11.2010 19:35, schrieb Sean Owen: > >> But is it then ranking the recommendations by the estimated > pref? If > >> it's always 1, then the ordering is not meaningful. > >> > >> Maybe it is, I just haven't looked at your changes in much detail > >> since you made them although it looked broadly correct and proper. > >> > >> On Fri, Nov 26, 2010 at 6:33 PM, Sebastian Schelter > <[email protected] <mailto:[email protected]>> wrote: > >> > >>> If all ratings have value 1 (cause we use boolean data) the > result of > >>> the Predicition can also only be 1. > >>> > > > > > >
