Using 1 is just fine for the reasons you give. You would be surprised how OK it is to use this even for dislikes. In fact just omit the third field in your CSV.
However you need to set the boolean data flag and choose a similarity metric that is defined over such data. Pearson / cosine is not for example since every value is 1. This is why there is no output. On Jun 23, 2012 1:33 AM, "Something Something" <[email protected]> wrote: > I tested my setup of ItemSimilarityJob using the MovieLens dataset & got > the expected results. It looks like my setup is good. > > Here's what I have: > > I have data coming in the following format: UserId, GroupId, Frequency (how > many times the user chose the group), Max timestamp (the last time the user > chose the group). > > Based on this dataset we need to figure out which groups look alike. I > decided to use "item based collaborative filtering" but I have 3 concerns: > > 1) We don't have any knowledge of "Dislikes"; we only know which groups > users "Like". > 2) We don't really have ratings. In other words, users don't rate the > group. Either they choose OR they don't. > 3) Frequency doesn't really imply interest level. > > > I decided to try 'ItemSimilarityJob' by using a CSV file in the following > format: > > UserId, GroupId, "1" > > In other words, the rating value is always 1. There are NO rows with value > "0". This is producing NO OUTPUT, but the job finishes successfully. > > Is this the right way to solve the problem? Is there some other Algorithm > that I should be using? Thanks for the help. >
