Ok Sebastian, thanks for the explanation. I'll study each similarity in more detail.
On Sun, Nov 28, 2010 at 11:37 AM, Sebastian Schelter <[email protected]> wrote: > Pearson-Correlation and boolean data don't fit, all cooccurring ratings > will have value 1 and therefore no correlation can be computed as the > compared vectors are identical. > > --sebastian > > Am 28.11.2010 11:28, schrieb Jordi Abad: > > Hi, > > I applied the changes of MAHOUT-553 (thanks Sebastian!) against mahout-0.4. > Everything makes sense now. I've tried it with different similarities > (SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT, > SIMILARITY_UNCENTERED_COSINE) and it works fine (i.e. I got good > recommendations with different scores) but when I tried > SIMILARITY_PEARSON_CORRELATION, I got an empty part-00000 file. Is it > normal? > > On Fri, Nov 26, 2010 at 7:50 PM, Sean Owen <[email protected]> wrote: > >> The behavior difference is fairly simple. Instead of a weighted >> average of preferences (which will always equal 1.0), compute some >> other function of those weights -- for example, the average of the >> weights. >> >> See GenericBooleanPrefItemBasedRecommender. It's actually just summing >> the weights. This is nearly the same thing since the number of items >> participating in the average is the same for all estimates. *Nearly* >> the same since some can be NaN. >> >> It's an open question whether there aren't better functions of the >> weights to use, but this is a fine start, IMHO. >> >> >> On Fri, Nov 26, 2010 at 6:45 PM, Sebastian Schelter <[email protected]> >> wrote: >> > Hi Sean, >> > >> > the prediction computation for boolean data is done in >> > AggregateAndRecommendReducer.reduceBooleanData() >> > >> > It computes *all* possible items to recommend for the current user and >> > writes out only the n first after that, with n being the number >> > specified in the parameter --numRecommendations given to RecommenderJob. >> > >> > Can you point me to the code where the non-distributed code handles the >> > problem of ranking them? We could certainly emulate that behaviour in >> > the distributed code too. >> > >> > --sebastian >> > >> > >> > >> > Am 26.11.2010 19:35, schrieb Sean Owen: >> >> But is it then ranking the recommendations by the estimated pref? If >> >> it's always 1, then the ordering is not meaningful. >> >> >> >> Maybe it is, I just haven't looked at your changes in much detail >> >> since you made them although it looked broadly correct and proper. >> >> >> >> On Fri, Nov 26, 2010 at 6:33 PM, Sebastian Schelter <[email protected]> >> wrote: >> >> >> >>> If all ratings have value 1 (cause we use boolean data) the result of >> >>> the Predicition can also only be 1. >> >>> >> > >> > >> > > >
