This is because all the ratings are implicitly 1.0 when there are no ratings.
But I actually think this is symptomatic of a problem, since I note that those recommendations are quite suspiciously in order by item ID. I am not sure the current state of the distributed recommender is compatible with boolean data, but I am not an expert here -- Sebastian can we discuss what might be going on here? In the non-distributed code, items are given a "fake" estimated preferences which is not actually an estimated preference (because that would always be 1.0) but some other number that functions as a score -- average similarity to other items for example. This is used as a ranking and also returned as an "estimated preference" even though it's not. Can we do something like that here? or is it already working this way if certain values / options are set? On Fri, Nov 26, 2010 at 6:26 PM, Jordi Abad <[email protected]> wrote: > Hi, > > I'm running a RecommenderJob (mahout-0.4 version) over hadoop like this: > > hadoop-0.20 jar /mahout-distribution-0.4/mahout-core-0.4-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > -Dmapred.input.dir=input -Dmapred.output.dir=output -s > SIMILARITY_TANIMOTO_COEFFICIENT -b true > > The job works fine but when I examine the result I get things like: > > 12 [1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,168:1.0,173:1.0,180:1.0,199:1.0] > 14 [1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,14:1.0,21:1.0,22:1.0,23:1.0] > ... > > I can't understand why each recommendation gets 1.0 of score. It doesn't > matter which SimilarityClass I set. I always get a score of 1.0. > > My input file is a "boolean file" (1391374 rows) with values like: > > 1,6496241 > 1,4368916 > 1,4922226 > 1,4958662 > ... > > If I run > "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" job > over the same file I get good results for items. > > Any ideas? > > Thanks in advance. >
