As I already said multiple times, please use Mahout 0.6. It contains bug
fixes and performance improvements for this particular job.
--sebastian
On 21.10.2011 09:04, WangRamon wrote:
>
> Hi Sebastian I made the following change to resolve the issue in my local,
> it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I
> add a "int itemIdIndex" property with getter/setter methods in class
> PrefAndSimilarityColumnWritable, it will hold the item index for which the
> "prefValue" in this class is for. 2) Add
> "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class
> PartialMultiplyMapper line 51 to set the item index property created in step
> 1. 3) In class AggregateAndRecommendReducer, add the following code in line
> 147: // item which user has already given preference
> int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> // exclude item user has already given preference
> simColumn.set(itemIdIndex, Double.NaN); This will make the specific
> index value in the sim column as NaN for item that user has already given
> preference, then later plus or multiply this vector will also get a NaN value
> in that specific item index, so i exclude the items which user has already
> shown preference from recommendation. 4) At line 173 of the same class
> AggregateAndRecommendReducer, add a check to make the prediction value as NaN
> for those items user has given preference: double prediction =
> Double.NaN;
> if (!Double.isNaN(element.get())) {
> prediction = element.get() / denominators.getQuick(itemIDIndex);
> }
> Then, i get the correct recommendation, I have thought it carefully, but...
> maybe wrong, glad to hear your idea, and again, thank you very much.
> CheersRamon> From: [email protected]
>> To: [email protected]
>> Subject: RE: Recommend result contains item which user has already given
>> preference, is that correct?
>> Date: Fri, 21 Oct 2011 10:01:12 +0800
>>
>>
>> Hi Sebastian Unfortunately, i still get the wrong data from the
>> RecommenderJob after i clean everything, check the following recommend
>> result part: 49
>> [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
>> Now, look at the input data for user 49, item 312611, 428914, 208617,
>> 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly
>> all of them are wrong, I hope i can send you the test data, but it will be
>> 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
>> 49,98795,4
>> 49,262163,1
>> 49,66009,4
>> 49,414484,2
>> 49,405329,3
>> 49,312611,1
>> 49,336441,4
>> 49,136494,5
>> 49,345206,3
>> 49,479179,1
>> 49,318960,4
>> 49,52683,3
>> 49,270840,3
>> 49,264828,1
>> 49,222390,4
>> 49,456614,5
>> 49,436207,5
>> 49,306308,2
>> 49,391582,5
>> 49,494200,4
>> 49,423328,3
>> 49,112997,3
>> 49,229347,5
>> 49,474928,3
>> 49,349350,1
>> 49,208508,3
>> 49,314397,2
>> 49,14673,2
>> 49,496041,4
>> 49,301875,4
>> 49,234234,1
>> 49,325287,3
>> 49,35756,5
>> 49,365097,4
>> 49,13376,4
>> 49,333634,2
>> 49,283494,5
>> 49,208617,3
>> 49,245390,1
>> 49,221804,2
>> 49,347821,3
>> 49,138954,5
>> 49,164206,5
>> 49,72238,1
>> 49,356632,1
>> 49,452296,3
>> 49,182288,5
>> 49,499031,5
>> 49,150727,4
>> 49,240533,5
>> 49,326081,4
>> 49,220683,2
>> 49,196527,2
>> 49,177165,3
>> 49,411709,5
>> 49,360722,3
>> 49,466310,1
>> 49,160375,2
>> 49,137203,5
>> 49,32634,4
>> 49,62134,5
>> 49,96982,5
>> 49,196951,1
>> 49,304155,5
>> 49,406109,4
>> 49,244276,5
>> 49,189552,1
>> 49,442215,3
>> 49,268806,2
>> 49,364912,2
>> 49,410896,5
>> 49,450602,5
>> 49,151703,1
>> 49,248872,4
>> 49,21684,1
>> 49,41196,1
>> 49,26614,2
>> 49,369075,5
>> 49,321916,1
>> 49,325081,1
>> 49,329877,4
>> 49,344661,4
>> 49,8429,3
>> 49,69279,1
>> 49,143695,1
>> 49,229120,2
>> 49,26298,4
>> 49,54456,1
>> 49,75937,4
>> 49,87042,3
>> 49,345383,5
>> 49,363683,4
>> 49,128047,3
>> 49,234878,5
>> 49,428914,3
>> 49,353107,2
>> 49,266850,4
>> 49,421211,3
>> 49,265739,4
>> 49,303723,1
>> 49,244575,4
>> 49,303625,4
>> 49,350481,5
>> 49,63985,4
>> 49,207327,3
>> 49,397535,1
>> 49,300916,5
>> 49,358094,4
>> 49,314919,5
>> 49,309355,5
>> 49,403169,5
>> 49,90148,4
>> 49,224056,4
>> 49,359181,2
>> 49,341927,5
>> 49,436521,4
>> 49,480682,4
>> 49,315561,3
>> 49,218647,5
>> 49,245276,2
>> 49,93189,1
>> 49,204695,4
>> 49,498350,5
>> 49,155787,3
>> 49,112730,3
>> 49,416756,2
>> 49,411909,4
>> 49,253353,2
>> 49,196663,5
>> 49,40903,3
>> 49,51873,2
>> 49,66925,3
>> > Date: Thu, 20 Oct 2011 18:40:38 +0200
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: Re: Recommend result contains item which user has already given
>>> preference, is that correct?
>>>
>>> To put it simplified:
>>>
>>> The vector of recommendations is the sum of the similarity vectors for
>>> all preferred items. In each similarity vector for a preferred item the
>>> entry for that particular item is set to NaN.
>>>
>>> That means that in the recommendation vector the entries for all
>>> preferred items will be NaN.
>>>
>>> It's a neat trick that is unfortunately very hard to see in the code.
>>>
>>> --sebastian
>>>
>>> On 20.10.2011 18:36, WangRamon wrote:
>>>>
>>>> Hi Sebastian
>>>> "But as the entry for the item itself is set to NaN in its
>>>> similarityvector and NaN plus something stays always NaN, the predicted
>>>> preferencefor an item that was already preferred is NaN. And the NaN
>>>> entries aredropped later."
>>>> Wait a minute here, i can understand NaN plus something stays always NaN,
>>>> but, how do you explain "the predicted preference for an item that was
>>>> already preferred is NaN", where do you put the code to check an item that
>>>> was already preferred? The only thing about NaN in
>>>> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a
>>>> similarity of NaN, am i right?
>>>> Thanks
>>>> Ramon
>>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
>>>>> From: [email protected]
>>>>> To: [email protected]
>>>>> Subject: Re: Recommend result contains item which user has already given
>>>>> preference, is that correct?
>>>>>
>>>>> On 20.10.2011 16:57, WangRamon wrote:
>>>>>>
>>>>>> Hi Sebastian and Sean
>>>>>> Thanks for your help.
>>>>>>
>>>>>> I re-read the code again (debug seems to be very difficult for me to
>>>>>> setup the environment) and find the line in
>>>>>> SimilarityMatrixRowWrapperMapper, i past it below with the comments:
>>>>>> /* remove self similarity */
>>>>>> similarityMatrixRow.set(key.get(), Double.NaN);
>>>>>> I think the meanning is to mark the similarity between Item X and Item X
>>>>>> (the identical one) as NaN, but it doesn't exclude Item X from
>>>>>> recommendation, then in AggregateAndRecommendReducer, it uses
>>>>>> simColumn.times(prefValue) as part of the formula to calculate the
>>>>>> preferences for all items that similar to Item i (it could be Item X or
>>>>>> some other item), then return the top 10 (default) for a user.
>>>>>> During this process, i cannot see any code to exclude an item which the
>>>>>> user has already given preference from recommendation.
>>>>>
>>>>> It's a little bit hidden :) For each preferred item, a vector of all its
>>>>> similarities is added:
>>>>>
>>>>> numerators = numerators == null
>>>>> ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
>>>>> simColumn.times(prefValue)
>>>>> : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
>>>>> : simColumn.times(prefValue));
>>>>>
>>>>> But as the entry for the item itself is set to NaN in its similarity
>>>>> vector and NaN plus something stays always NaN, the predicted preference
>>>>> for an item that was already preferred is NaN. And the NaN entries are
>>>>> dropped later.
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>>> Correct me if i miss something, thank you guys.
>>>>>> Cheers Ramon
>>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>>>>>>> Subject: Re: Recommend result contains item which user has already
>>>>>>> given preference, is that correct?
>>>>>>> From: [email protected]
>>>>>>> To: [email protected]
>>>>>>>
>>>>>>> Ah OK, figured as much. WangRamon does that answer your question
>>>>>>> and/or can you debug to see if this is happening, not happening for
>>>>>>> you in your use case?
>>>>>>>
>>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <[email protected]>
>>>>>>> wrote:
>>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>>>>>>> unit test that checks whether a user is only recommended unknown items
>>>>>>>> which still works.
>>>>>>
>>>>>
>>>>
>>>
>>
>