A related question please. Do Mahout remove the 16% good items before recommending and use the 84% to predict the 16% ?
Many thanks ! On Thu, Aug 9, 2012 at 11:20 AM, ziad kamel <[email protected]> wrote: > Thanks Sean ! > > Please correct me , when selecting the 16% items we use the top items > , but when comparing with the recommended items we don't use sorted > list . In other words we just compare 2 lists? > > How mahout deal with these 2 cases? > > Case 1: user have many items. Assume 1000 item , so if we recommend 5 > good items from the 160 items we will get a precision of 100% ? is > that ok ? > > Case 2: user having less than 7 items. Assume 5 items, in this case > there won't be top items in the list so the user won't get any > recommendation and no precision ? Do we need to select another > threshold like 50% ? > > > > On Thu, Aug 9, 2012 at 10:52 AM, Sean Owen <[email protected]> wrote: >> Hi Ziad, I did answer your last question on this list -- don't see this one >> previously though. >> >> The "relevant" items are chosen as those whose pref value exceed some given >> threshold. The default threshold is the mean of all 100 pref values plus >> one standard deviation. Assuming the prefs are about normally distributed >> about the mean (a significant assumption), and because 84% of the data >> should therefore fall below mean plus 1 standard deviation, that means you >> pick about the top 16% (16 of 100) items as relevant. >> >> Yes your interpretation of precision is correct. >> >> On Thu, Aug 9, 2012 at 4:12 PM, ziad kamel <[email protected]> wrote: >> >>> Hi , I asked this question few months ago with no answer. Hopefully >>> someone can help . >>> >>> When not using a threshold, the default is to use average ratings plus >>> one standard deviation which equals to 16%. Assume that a user have >>> 100 items. Does that mean that his good recommendations are the top 16 >>> items ? In case we use precision at 5 , we going to select only top 5 >>> items from the 100. So is the precison going to be how many among the >>> 16 items are in the 5 items ? Assume that we get 4 from the 16 in list >>> of 5 , the precision will be 80% ? >>> >>> IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, >>> model, null, 5, >>> GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); >>> >>> Thanks ! >>>
