In the Mahout recommender 0 means the user has expressed no preference in any 
way. If it means something else to you, if it means a low preference then add 
one to all preference values so you have no “0” values. 

Often I turn ratings into boolean prefs so if this is a rating I would drop 0s 
from the data and decide whether 1 means anything. If it means they were 
ambivalent about the items you may want to drop it too. In general you are 
looking for a signal of clear preference. How do you deal with clear 
un-preferred things, disliked thing?  Good question but not by mixing them with 
the preferred things. One way I have dealt with disliked items is to filter 
them from recommendations for each user.

BTW LogLikelihoodRatio will give better results in most cases I’ve seen.

On Jul 2, 2014, at 3:50 AM, Peng Zhang <[email protected]> wrote:

Hi,

Should I avoid using 0 as a preference value in mahout’s input file to do 
recommendation?

I am running mahout-0.9’s recommenditembased on hadoop 2.0 cluster with two 
nodes, with pearson correlation as similarity class. If I use 1 and 2 as 
preference values, the generated similarity is correct; but if I use 0 and 1 as 
preference values, the generated similarity is missing.

1. Input File:
0,0,1
1,0,2
0,1,1
1,1,2
Generated similarity between item 1 and 0 is 0.9999, which is correct

2. Input File:
0,0,0
1,0,1
0,1,0
1,1,1 
Similarity is not generated between item 1 and 0, which is not as expected

3. Detailed Command:
1. Run Recommendation
mahout recommenditembased -s SIMILARITY_PEARSON_CORRELATION -i 0_1_tuples.csv 
-o output --numRecommendations 5  --outputPathForSimilarityMatrix 
similarityMatrix --randomSeed 2014
2. View similarity between item 1 and 0:
hdfs dfs -cat similarityMatrix/part-r-00000


Thank you,

Peng







Reply via email to