Provide me with the data, I will have a look at it.

Can you say with what arguments you invoke RecommenderJob?

--sebastian

On 21.10.2011 04:01, WangRamon wrote:
> 
> Hi Sebastian Unfortunately, i still get the wrong data from the 
> RecommenderJob after i clean everything, check the following recommend result 
> part: 49 
> [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
>  Now, look at the input data for user 49, item 312611, 428914, 208617, 
> 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly 
> all of them are wrong, I hope i can send you the test data, but it will be 
> 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> 49,98795,4
> 49,262163,1
> 49,66009,4
> 49,414484,2
> 49,405329,3
> 49,312611,1
> 49,336441,4
> 49,136494,5
> 49,345206,3
> 49,479179,1
> 49,318960,4
> 49,52683,3
> 49,270840,3
> 49,264828,1
> 49,222390,4
> 49,456614,5
> 49,436207,5
> 49,306308,2
> 49,391582,5
> 49,494200,4
> 49,423328,3
> 49,112997,3
> 49,229347,5
> 49,474928,3
> 49,349350,1
> 49,208508,3
> 49,314397,2
> 49,14673,2
> 49,496041,4
> 49,301875,4
> 49,234234,1
> 49,325287,3
> 49,35756,5
> 49,365097,4
> 49,13376,4
> 49,333634,2
> 49,283494,5
> 49,208617,3
> 49,245390,1
> 49,221804,2
> 49,347821,3
> 49,138954,5
> 49,164206,5
> 49,72238,1
> 49,356632,1
> 49,452296,3
> 49,182288,5
> 49,499031,5
> 49,150727,4
> 49,240533,5
> 49,326081,4
> 49,220683,2
> 49,196527,2
> 49,177165,3
> 49,411709,5
> 49,360722,3
> 49,466310,1
> 49,160375,2
> 49,137203,5
> 49,32634,4
> 49,62134,5
> 49,96982,5
> 49,196951,1
> 49,304155,5
> 49,406109,4
> 49,244276,5
> 49,189552,1
> 49,442215,3
> 49,268806,2
> 49,364912,2
> 49,410896,5
> 49,450602,5
> 49,151703,1
> 49,248872,4
> 49,21684,1
> 49,41196,1
> 49,26614,2
> 49,369075,5
> 49,321916,1
> 49,325081,1
> 49,329877,4
> 49,344661,4
> 49,8429,3
> 49,69279,1
> 49,143695,1
> 49,229120,2
> 49,26298,4
> 49,54456,1
> 49,75937,4
> 49,87042,3
> 49,345383,5
> 49,363683,4
> 49,128047,3
> 49,234878,5
> 49,428914,3
> 49,353107,2
> 49,266850,4
> 49,421211,3
> 49,265739,4
> 49,303723,1
> 49,244575,4
> 49,303625,4
> 49,350481,5
> 49,63985,4
> 49,207327,3
> 49,397535,1
> 49,300916,5
> 49,358094,4
> 49,314919,5
> 49,309355,5
> 49,403169,5
> 49,90148,4
> 49,224056,4
> 49,359181,2
> 49,341927,5
> 49,436521,4
> 49,480682,4
> 49,315561,3
> 49,218647,5
> 49,245276,2
> 49,93189,1
> 49,204695,4
> 49,498350,5
> 49,155787,3
> 49,112730,3
> 49,416756,2
> 49,411909,4
> 49,253353,2
> 49,196663,5
> 49,40903,3
> 49,51873,2
> 49,66925,3
>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Recommend result contains item which user has already given 
>> preference, is that correct?
>>
>> To put it simplified:
>>
>> The vector of recommendations is the sum of the similarity vectors for
>> all preferred items. In each similarity vector for a preferred item the
>> entry for that particular item is set to NaN.
>>
>> That means that in the recommendation vector the entries for all
>> preferred items will be NaN.
>>
>> It's a neat trick that is unfortunately very hard to see in the code.
>>
>> --sebastian
>>
>> On 20.10.2011 18:36, WangRamon wrote:
>>>
>>> Hi Sebastian
>>> "But as the entry for the item itself is set to NaN in its similarityvector 
>>> and NaN plus something stays always NaN, the predicted preferencefor an 
>>> item that was already preferred is NaN. And the NaN entries aredropped 
>>> later."
>>> Wait a minute here, i can understand NaN plus something stays always NaN, 
>>> but, how do you explain "the predicted preference for an item that was 
>>> already preferred is NaN", where do you put the code to check an item that 
>>> was already preferred? The only thing about NaN in 
>>> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a 
>>> similarity of NaN, am i right?
>>> Thanks
>>> Ramon
>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: Recommend result contains item which user has already given 
>>>> preference, is that correct?
>>>>
>>>> On 20.10.2011 16:57, WangRamon wrote:
>>>>>
>>>>> Hi Sebastian and Sean 
>>>>> Thanks for your help. 
>>>>>
>>>>> I re-read the code again (debug seems to be very difficult for me to 
>>>>> setup the environment) and find the line in 
>>>>> SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
>>>>>     /* remove self similarity */ 
>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
>>>>> I think the meanning is to mark the similarity between Item X and Item X 
>>>>> (the identical one) as NaN, but it doesn't exclude Item X from 
>>>>> recommendation, then in AggregateAndRecommendReducer, it uses 
>>>>> simColumn.times(prefValue) as part of the formula to calculate the 
>>>>> preferences for all items that similar to Item i (it could be Item X or 
>>>>> some other item), then return the top 10 (default) for a user. 
>>>>> During this process, i cannot see any code to exclude an item which the 
>>>>> user has already given preference from recommendation. 
>>>>
>>>> It's a little bit hidden :) For each preferred item, a vector of all its
>>>> similarities is added:
>>>>
>>>>       numerators = numerators == null
>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
>>>> simColumn.times(prefValue)
>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
>>>> : simColumn.times(prefValue));
>>>>
>>>> But as the entry for the item itself is set to NaN in its similarity
>>>> vector and NaN plus something stays always NaN, the predicted preference
>>>> for an item that was already preferred is NaN. And the NaN entries are
>>>> dropped later.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>>> Correct me if i miss something, thank you guys. 
>>>>> Cheers Ramon
>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>>>>>> Subject: Re: Recommend result contains item which user has already given 
>>>>>> preference, is that correct?
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>>
>>>>>> Ah OK, figured as much. WangRamon does that answer your question
>>>>>> and/or can you debug to see if this is happening, not happening for
>>>>>> you in your use case?
>>>>>>
>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <[email protected]> 
>>>>>> wrote:
>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>>>>>> unit test that checks whether a user is only recommended unknown items
>>>>>>> which still works.
>>>>>                                     
>>>>
>>>                                       
>>
>                                         

Reply via email to