Hi Sebastian  I have tried Mahout 0.6 SNAPSHOT,  it's great, the test result of 
the RecommenderJob shows it brings us huge performance boots and there is no 
this issue as described in this mail thread, thanks.  Cheers Ramon
 > Date: Fri, 21 Oct 2011 09:06:50 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: Recommend result contains item which user has already given 
> preference, is that correct?
> 
> As I already said multiple times, please use Mahout 0.6. It contains bug
> fixes and performance improvements for this particular job.
> 
> --sebastian
> 
> On 21.10.2011 09:04, WangRamon wrote:
> > 
> > Hi Sebastian I made the following change to resolve the issue in my local, 
> > it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) 
> > I add a "int itemIdIndex" property with getter/setter methods in class 
> > PrefAndSimilarityColumnWritable, it will hold the item index for which the 
> > "prefValue" in this class is for.  2) Add 
> > "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class 
> > PartialMultiplyMapper line 51 to set the item index property created in 
> > step 1.  3) In class AggregateAndRecommendReducer, add the following code 
> > in line 147:       // item which user has already given preference
> >       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> >       // exclude item user has already given preference
> >       simColumn.set(itemIdIndex, Double.NaN);  This will make the specific 
> > index value in the sim column as NaN for item that user has already given 
> > preference, then later plus or multiply this vector will also get a NaN 
> > value in that specific item index, so i exclude the items which user has 
> > already shown preference from recommendation. 4) At line 173 of the same 
> > class AggregateAndRecommendReducer, add a check to make the prediction 
> > value as NaN for those items user has given preference:        double 
> > prediction = Double.NaN;
> >      if (!Double.isNaN(element.get())) {
> >       prediction = element.get() / denominators.getQuick(itemIDIndex);
> >      }
> >  Then, i get the correct recommendation, I have thought it carefully, 
> > but... maybe wrong, glad to hear your idea, and again, thank you very much. 
> >  CheersRamon> From: [email protected]
> >> To: [email protected]
> >> Subject: RE: Recommend result contains item which user has already given 
> >> preference, is that correct?
> >> Date: Fri, 21 Oct 2011 10:01:12 +0800
> >>
> >>
> >> Hi Sebastian Unfortunately, i still get the wrong data from the 
> >> RecommenderJob after i clean everything, check the following recommend 
> >> result part: 49 
> >> [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
> >>  Now, look at the input data for user 49, item 312611, 428914, 208617, 
> >> 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly 
> >> all of them are wrong, I hope i can send you the test data, but it will be 
> >> 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> >> 49,98795,4
> >> 49,262163,1
> >> 49,66009,4
> >> 49,414484,2
> >> 49,405329,3
> >> 49,312611,1
> >> 49,336441,4
> >> 49,136494,5
> >> 49,345206,3
> >> 49,479179,1
> >> 49,318960,4
> >> 49,52683,3
> >> 49,270840,3
> >> 49,264828,1
> >> 49,222390,4
> >> 49,456614,5
> >> 49,436207,5
> >> 49,306308,2
> >> 49,391582,5
> >> 49,494200,4
> >> 49,423328,3
> >> 49,112997,3
> >> 49,229347,5
> >> 49,474928,3
> >> 49,349350,1
> >> 49,208508,3
> >> 49,314397,2
> >> 49,14673,2
> >> 49,496041,4
> >> 49,301875,4
> >> 49,234234,1
> >> 49,325287,3
> >> 49,35756,5
> >> 49,365097,4
> >> 49,13376,4
> >> 49,333634,2
> >> 49,283494,5
> >> 49,208617,3
> >> 49,245390,1
> >> 49,221804,2
> >> 49,347821,3
> >> 49,138954,5
> >> 49,164206,5
> >> 49,72238,1
> >> 49,356632,1
> >> 49,452296,3
> >> 49,182288,5
> >> 49,499031,5
> >> 49,150727,4
> >> 49,240533,5
> >> 49,326081,4
> >> 49,220683,2
> >> 49,196527,2
> >> 49,177165,3
> >> 49,411709,5
> >> 49,360722,3
> >> 49,466310,1
> >> 49,160375,2
> >> 49,137203,5
> >> 49,32634,4
> >> 49,62134,5
> >> 49,96982,5
> >> 49,196951,1
> >> 49,304155,5
> >> 49,406109,4
> >> 49,244276,5
> >> 49,189552,1
> >> 49,442215,3
> >> 49,268806,2
> >> 49,364912,2
> >> 49,410896,5
> >> 49,450602,5
> >> 49,151703,1
> >> 49,248872,4
> >> 49,21684,1
> >> 49,41196,1
> >> 49,26614,2
> >> 49,369075,5
> >> 49,321916,1
> >> 49,325081,1
> >> 49,329877,4
> >> 49,344661,4
> >> 49,8429,3
> >> 49,69279,1
> >> 49,143695,1
> >> 49,229120,2
> >> 49,26298,4
> >> 49,54456,1
> >> 49,75937,4
> >> 49,87042,3
> >> 49,345383,5
> >> 49,363683,4
> >> 49,128047,3
> >> 49,234878,5
> >> 49,428914,3
> >> 49,353107,2
> >> 49,266850,4
> >> 49,421211,3
> >> 49,265739,4
> >> 49,303723,1
> >> 49,244575,4
> >> 49,303625,4
> >> 49,350481,5
> >> 49,63985,4
> >> 49,207327,3
> >> 49,397535,1
> >> 49,300916,5
> >> 49,358094,4
> >> 49,314919,5
> >> 49,309355,5
> >> 49,403169,5
> >> 49,90148,4
> >> 49,224056,4
> >> 49,359181,2
> >> 49,341927,5
> >> 49,436521,4
> >> 49,480682,4
> >> 49,315561,3
> >> 49,218647,5
> >> 49,245276,2
> >> 49,93189,1
> >> 49,204695,4
> >> 49,498350,5
> >> 49,155787,3
> >> 49,112730,3
> >> 49,416756,2
> >> 49,411909,4
> >> 49,253353,2
> >> 49,196663,5
> >> 49,40903,3
> >> 49,51873,2
> >> 49,66925,3
> >>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> >>> From: [email protected]
> >>> To: [email protected]
> >>> Subject: Re: Recommend result contains item which user has already given 
> >>> preference, is that correct?
> >>>
> >>> To put it simplified:
> >>>
> >>> The vector of recommendations is the sum of the similarity vectors for
> >>> all preferred items. In each similarity vector for a preferred item the
> >>> entry for that particular item is set to NaN.
> >>>
> >>> That means that in the recommendation vector the entries for all
> >>> preferred items will be NaN.
> >>>
> >>> It's a neat trick that is unfortunately very hard to see in the code.
> >>>
> >>> --sebastian
> >>>
> >>> On 20.10.2011 18:36, WangRamon wrote:
> >>>>
> >>>> Hi Sebastian
> >>>> "But as the entry for the item itself is set to NaN in its 
> >>>> similarityvector and NaN plus something stays always NaN, the predicted 
> >>>> preferencefor an item that was already preferred is NaN. And the NaN 
> >>>> entries aredropped later."
> >>>> Wait a minute here, i can understand NaN plus something stays always 
> >>>> NaN, but, how do you explain "the predicted preference for an item that 
> >>>> was already preferred is NaN", where do you put the code to check an 
> >>>> item that was already preferred? The only thing about NaN in 
> >>>> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a 
> >>>> similarity of NaN, am i right?
> >>>> Thanks
> >>>> Ramon
> >>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >>>>> From: [email protected]
> >>>>> To: [email protected]
> >>>>> Subject: Re: Recommend result contains item which user has already 
> >>>>> given preference, is that correct?
> >>>>>
> >>>>> On 20.10.2011 16:57, WangRamon wrote:
> >>>>>>
> >>>>>> Hi Sebastian and Sean 
> >>>>>> Thanks for your help. 
> >>>>>>
> >>>>>> I re-read the code again (debug seems to be very difficult for me to 
> >>>>>> setup the environment) and find the line in 
> >>>>>> SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>>>>     /* remove self similarity */ 
> >>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>>>>> I think the meanning is to mark the similarity between Item X and Item 
> >>>>>> X (the identical one) as NaN, but it doesn't exclude Item X from 
> >>>>>> recommendation, then in AggregateAndRecommendReducer, it uses 
> >>>>>> simColumn.times(prefValue) as part of the formula to calculate the 
> >>>>>> preferences for all items that similar to Item i (it could be Item X 
> >>>>>> or some other item), then return the top 10 (default) for a user. 
> >>>>>> During this process, i cannot see any code to exclude an item which 
> >>>>>> the user has already given preference from recommendation. 
> >>>>>
> >>>>> It's a little bit hidden :) For each preferred item, a vector of all its
> >>>>> similarities is added:
> >>>>>
> >>>>>       numerators = numerators == null
> >>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >>>>> simColumn.times(prefValue)
> >>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >>>>> : simColumn.times(prefValue));
> >>>>>
> >>>>> But as the entry for the item itself is set to NaN in its similarity
> >>>>> vector and NaN plus something stays always NaN, the predicted preference
> >>>>> for an item that was already preferred is NaN. And the NaN entries are
> >>>>> dropped later.
> >>>>>
> >>>>> --sebastian
> >>>>>
> >>>>>
> >>>>>> Correct me if i miss something, thank you guys. 
> >>>>>> Cheers Ramon
> >>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>>>>> Subject: Re: Recommend result contains item which user has already 
> >>>>>>> given preference, is that correct?
> >>>>>>> From: [email protected]
> >>>>>>> To: [email protected]
> >>>>>>>
> >>>>>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>>>>> and/or can you debug to see if this is happening, not happening for
> >>>>>>> you in your use case?
> >>>>>>>
> >>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <[email protected]> 
> >>>>>>> wrote:
> >>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also 
> >>>>>>>> have a
> >>>>>>>> unit test that checks whether a user is only recommended unknown 
> >>>>>>>> items
> >>>>>>>> which still works.
> >>>>>>                                          
> >>>>>
> >>>>                                            
> >>>
> >>                                      
> >                                       
> 
                                          

Reply via email to