You mean, have the user specify all items that are considered relevant? yes that could be useful. Do you have a patch in mind?
Your analysis is correct, and I would not call it a bug. It's a symptom of how little information the evaluation has to work with here without ratings. It has to pick random items as "relevant", for starters. It's another reason your idea is good, to let the user specify those relevant items. On Thu, Jul 21, 2011 at 1:49 PM, Marko Ciric <[email protected]> wrote: > Hi guys, > > I wonder if Mahout should have a "precision and recall" evaluator that > calculates the relevant items data set without looking to the relevance > threshold. This would be suitable for data sets with boolean preference > nature. In addition, the relevant items can be removed from the training > data set by random (removing first couple of preferred items every time > wouldn't be a great idea). > > On the other hand, having relevance threshold > with RecommenderIRStatsEvaluator set to 1.0 removes exactly "at" number of > items. As the recommender returns that number of items, the precision and > recall would have the same value. Is this Ok or is it a bug, given that > precision = intersection / num_recommended_items (where > num_recommended_items is almost always "at") > recall = intersection / num_relevant_items (also "at" as the previously > mentioned why relevanceThreshold is 1.0)? > > > -- > Marko Ćirić > [email protected] >
