Hi, I am a student of recommender systems and have been testing Mahout’s
recommender system (user-based and item-based collaborative filtering) for
the last few months and have a few basic questions:
(1) GenericUserBasedRecommender Prediction calculation:
I noticed that Mahout has implemented the following user-based calculation
that computes a weighted average of other user’s ratings,
preference += theSimilarity * pref;
totalSimilarity += theSimilarity;
.. and later,
float estimate = (float) (preference / totalSimilarity);
Although this is certainly a sound approach, other approaches have been
suggested in the literature as cited in
https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation .
Can you please provide some insight as to why you selected the above
prediction calculation approach for Mahout?
(2) GenericUserBasedRecommender Similarity weighting:
I also noticed that Mahout has implemented the following
PearsonCorrelationSimilarity weighting when the WEIGHTED parameter is used
in the similarity constructor:
if (weighted) {
double scaleFactor = 1.0 - (double) count / (double) (num + 1);
if (result < 0.0) {
result = -1.0 + scaleFactor * (1.0 + result);
} else {
result = 1.0 - scaleFactor * (1.0 - result);
}
Would you please provide some insight as to why you decided to use this
weighting approach?
(3) GenericUserBasedRecommender Similarity calculation:
It appears that Mahout calculates similarities between users to determine
the neighborhood and then again during the prediction calculation. When
running an evaluator (e.g., DifferenceRecommenderEvaluator), I can see that
the user similarities are computed repeatedly for each user. Is there a
reason why it was implemented this way? (“time vs space” tradeoff?)
(4) GenericItemBasedRecommender Prediction calculation:
I noticed that Mahout has implemented the following item-based calculation
that computes a weighted average of the user’s ratings for similar items,
preference += theSimilarity * prefs.getValue(i);
totalSimilarity += theSimilarity; // Weights can be negative!
.. and later,
float estimate = (float) (preference / totalSimilarity);
Can you provide some insight as to why you decided to use this approach? Were
there any other approaches you considered but rejected, and if so, why did
you reject them?
Thanks .. Carlos