Provide me with the data, I will have a look at it. Can you say with what arguments you invoke RecommenderJob?
--sebastian On 21.10.2011 04:01, WangRamon wrote: > > Hi Sebastian Unfortunately, i still get the wrong data from the > RecommenderJob after i clean everything, check the following recommend result > part: 49 > [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] > Now, look at the input data for user 49, item 312611, 428914, 208617, > 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly > all of them are wrong, I hope i can send you the test data, but it will be > 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4 > 49,98795,4 > 49,262163,1 > 49,66009,4 > 49,414484,2 > 49,405329,3 > 49,312611,1 > 49,336441,4 > 49,136494,5 > 49,345206,3 > 49,479179,1 > 49,318960,4 > 49,52683,3 > 49,270840,3 > 49,264828,1 > 49,222390,4 > 49,456614,5 > 49,436207,5 > 49,306308,2 > 49,391582,5 > 49,494200,4 > 49,423328,3 > 49,112997,3 > 49,229347,5 > 49,474928,3 > 49,349350,1 > 49,208508,3 > 49,314397,2 > 49,14673,2 > 49,496041,4 > 49,301875,4 > 49,234234,1 > 49,325287,3 > 49,35756,5 > 49,365097,4 > 49,13376,4 > 49,333634,2 > 49,283494,5 > 49,208617,3 > 49,245390,1 > 49,221804,2 > 49,347821,3 > 49,138954,5 > 49,164206,5 > 49,72238,1 > 49,356632,1 > 49,452296,3 > 49,182288,5 > 49,499031,5 > 49,150727,4 > 49,240533,5 > 49,326081,4 > 49,220683,2 > 49,196527,2 > 49,177165,3 > 49,411709,5 > 49,360722,3 > 49,466310,1 > 49,160375,2 > 49,137203,5 > 49,32634,4 > 49,62134,5 > 49,96982,5 > 49,196951,1 > 49,304155,5 > 49,406109,4 > 49,244276,5 > 49,189552,1 > 49,442215,3 > 49,268806,2 > 49,364912,2 > 49,410896,5 > 49,450602,5 > 49,151703,1 > 49,248872,4 > 49,21684,1 > 49,41196,1 > 49,26614,2 > 49,369075,5 > 49,321916,1 > 49,325081,1 > 49,329877,4 > 49,344661,4 > 49,8429,3 > 49,69279,1 > 49,143695,1 > 49,229120,2 > 49,26298,4 > 49,54456,1 > 49,75937,4 > 49,87042,3 > 49,345383,5 > 49,363683,4 > 49,128047,3 > 49,234878,5 > 49,428914,3 > 49,353107,2 > 49,266850,4 > 49,421211,3 > 49,265739,4 > 49,303723,1 > 49,244575,4 > 49,303625,4 > 49,350481,5 > 49,63985,4 > 49,207327,3 > 49,397535,1 > 49,300916,5 > 49,358094,4 > 49,314919,5 > 49,309355,5 > 49,403169,5 > 49,90148,4 > 49,224056,4 > 49,359181,2 > 49,341927,5 > 49,436521,4 > 49,480682,4 > 49,315561,3 > 49,218647,5 > 49,245276,2 > 49,93189,1 > 49,204695,4 > 49,498350,5 > 49,155787,3 > 49,112730,3 > 49,416756,2 > 49,411909,4 > 49,253353,2 > 49,196663,5 > 49,40903,3 > 49,51873,2 > 49,66925,3 > > Date: Thu, 20 Oct 2011 18:40:38 +0200 >> From: [email protected] >> To: [email protected] >> Subject: Re: Recommend result contains item which user has already given >> preference, is that correct? >> >> To put it simplified: >> >> The vector of recommendations is the sum of the similarity vectors for >> all preferred items. In each similarity vector for a preferred item the >> entry for that particular item is set to NaN. >> >> That means that in the recommendation vector the entries for all >> preferred items will be NaN. >> >> It's a neat trick that is unfortunately very hard to see in the code. >> >> --sebastian >> >> On 20.10.2011 18:36, WangRamon wrote: >>> >>> Hi Sebastian >>> "But as the entry for the item itself is set to NaN in its similarityvector >>> and NaN plus something stays always NaN, the predicted preferencefor an >>> item that was already preferred is NaN. And the NaN entries aredropped >>> later." >>> Wait a minute here, i can understand NaN plus something stays always NaN, >>> but, how do you explain "the predicted preference for an item that was >>> already preferred is NaN", where do you put the code to check an item that >>> was already preferred? The only thing about NaN in >>> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a >>> similarity of NaN, am i right? >>> Thanks >>> Ramon >>>> Date: Thu, 20 Oct 2011 17:04:20 +0200 >>>> From: [email protected] >>>> To: [email protected] >>>> Subject: Re: Recommend result contains item which user has already given >>>> preference, is that correct? >>>> >>>> On 20.10.2011 16:57, WangRamon wrote: >>>>> >>>>> Hi Sebastian and Sean >>>>> Thanks for your help. >>>>> >>>>> I re-read the code again (debug seems to be very difficult for me to >>>>> setup the environment) and find the line in >>>>> SimilarityMatrixRowWrapperMapper, i past it below with the comments: >>>>> /* remove self similarity */ >>>>> similarityMatrixRow.set(key.get(), Double.NaN); >>>>> I think the meanning is to mark the similarity between Item X and Item X >>>>> (the identical one) as NaN, but it doesn't exclude Item X from >>>>> recommendation, then in AggregateAndRecommendReducer, it uses >>>>> simColumn.times(prefValue) as part of the formula to calculate the >>>>> preferences for all items that similar to Item i (it could be Item X or >>>>> some other item), then return the top 10 (default) for a user. >>>>> During this process, i cannot see any code to exclude an item which the >>>>> user has already given preference from recommendation. >>>> >>>> It's a little bit hidden :) For each preferred item, a vector of all its >>>> similarities is added: >>>> >>>> numerators = numerators == null >>>> ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() : >>>> simColumn.times(prefValue) >>>> : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn >>>> : simColumn.times(prefValue)); >>>> >>>> But as the entry for the item itself is set to NaN in its similarity >>>> vector and NaN plus something stays always NaN, the predicted preference >>>> for an item that was already preferred is NaN. And the NaN entries are >>>> dropped later. >>>> >>>> --sebastian >>>> >>>> >>>>> Correct me if i miss something, thank you guys. >>>>> Cheers Ramon >>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100 >>>>>> Subject: Re: Recommend result contains item which user has already given >>>>>> preference, is that correct? >>>>>> From: [email protected] >>>>>> To: [email protected] >>>>>> >>>>>> Ah OK, figured as much. WangRamon does that answer your question >>>>>> and/or can you debug to see if this is happening, not happening for >>>>>> you in your use case? >>>>>> >>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <[email protected]> >>>>>> wrote: >>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a >>>>>>> unit test that checks whether a user is only recommended unknown items >>>>>>> which still works. >>>>> >>>> >>> >> >
