I tested my setup of ItemSimilarityJob using the MovieLens dataset & got
the expected results.  It looks like my setup is good.

Here's what I have:

I have data coming in the following format: UserId, GroupId, Frequency (how
many times the user chose the group), Max timestamp (the last time the user
chose the group).

Based on this dataset we need to figure out which groups look alike. I
decided to use "item based collaborative filtering" but I have 3 concerns:

1)  We don't have any knowledge of "Dislikes"; we only know which groups
users "Like".
2)  We don't really have ratings. In other words, users don't rate the
group. Either they choose OR they don't.
3)  Frequency doesn't really imply interest level.


I decided to try 'ItemSimilarityJob' by using a CSV file in the following
format:

UserId, GroupId, "1"

In other words, the rating value is always 1.  There are NO rows with value
"0".  This is producing NO OUTPUT, but the job finishes successfully.

Is this the right way to solve the problem?  Is there some other Algorithm
that I should be using?  Thanks for the help.

Reply via email to