I'm working on a common dataset that includes the user id, item id, and
timestamp (the moment the user bought the item). As there are no
preferences, I needed a binary item-based recommender, which I found in
Mahout (GenericBooleanPrefItemBasedRecommender and the Tanimoto
coefficient). Following the recommender documentation, I tried to evaluate
it with GenericRecommenderIRStatsEvaluator(), but I ran into a few problems.
In fact, correct me if I'm wrong, but to me the evaluator will invariably
give us the same value for precision and recall. Since the items are all
rated with the binary 1.0 value, we give the recommender a threshold lower
than 1, thus for each user at items are considered relevant and removed from
the user's preferences to compute at recommendations. Precision and recall
are then computed with the two sets : relevant and retrieved items. Which
leads (I guess unless the recommender cannot compute at items) to precision
and recall being equal.
Results are still useful though, since a value of 0.2 for precision tells us
that among the at recommended items, 20% were effectively bought by the
user. Although one can wonder if those items are the best recommendations,
the least we can say is that it somehow corresponds to the user's
preferences.
However, I had a few ideas to give more meaning to precision and recall taht
I wanted to share, to get some advice before implementing them.
I read this topic and I fully understand that IRStatsEvaluator is different
from classic evaluators (giving the MAE for example), but I feel that it
makes sense to have a parameter trainingPercentage that divides users'
preferences in two subsets of items. The first (typically 20%) are
considered as relevant items, which are to be predicted using the second
subset. This task is at the moment defined by at, resulting in often equal
numbers of items in the relevant and retrieved subset. This at value would
still be a parameter used to define the number of items retrieved. The
evaluator could then be run varying these two parameters to find the best
compromise between precision and recall.
Furthermore, should the dataset contain a timestamp for each purchase, would
it not be logic to set the test set as the last items bought by the user ?
The evaluator would then follow what happens in real calculations.
Finaly, I believe the documentation page has some mistakes in the last code
excerpt :
evaluator.evaluate(builder, myModel, null, 3,
RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD,
§1.0);
should be
evaluator.evaluate(builder, null, myModel, null, 3,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
Thanks for your help !
--
View this message in context:
http://lucene.472066.n3.nabble.com/Mahout-Binary-Recommender-Evaluation-tp3202743p3202743.html
Sent from the Mahout User List mailing list archive at Nabble.com.