On Mon, Jul 25, 2011 at 10:05 AM, MT <[email protected]>wrote: > > > In fact, correct me if I'm wrong, but to me the evaluator will invariably > give us the same value for precision and recall. Since the items are all > rated with the binary 1.0 value, we give the recommender a threshold lower > than 1, thus for each user at items are considered relevant and removed > from > the user's preferences to compute at recommendations. Precision and recall > are then computed with the two sets : relevant and retrieved items. Which > leads (I guess unless the recommender cannot compute at items) to precision > and recall being equal. >
I think that's right in this case, where there are no ratings. It's pretty artificial to define 'relevant' here based on ratings! This isn't true if you have ratings. > > Results are still useful though, since a value of 0.2 for precision tells > us > that among the at recommended items, 20% were effectively bought by the > user. Although one can wonder if those items are the best recommendations, > the least we can say is that it somehow corresponds to the user's > preferences. > Right. I read this topic and I fully understand that IRStatsEvaluator is different > from classic evaluators (giving the MAE for example), but I feel that it > makes sense to have a parameter trainingPercentage that divides users' > preferences in two subsets of items. The first (typically 20%) are > considered as relevant items, which are to be predicted using the second > subset. This task is at the moment defined by at, resulting in often equal > numbers of items in the relevant and retrieved subset. This at value would > still be a parameter used to define the number of items retrieved. The > evaluator could then be run varying these two parameters to find the best > compromise between precision and recall. > I think it already has this parameter? it already accepts an "at" value. Is this what you mean? maybe an example or patch would clarify. > > Furthermore, should the dataset contain a timestamp for each purchase, > would > it not be logic to set the test set as the last items bought by the user ? > The evaluator would then follow what happens in real calculations. > Yes that sounds like a great improvement. The only difficulty is including it in a clean way. Up for a patch? > > Finaly, I believe the documentation page has some mistakes in the last code > excerpt : > > evaluator.evaluate(builder, myModel, null, 3, > RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD, > §1.0); > > should be > evaluator.evaluate(builder, null, myModel, null, 3, > GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); > > > OK will look at that.
