You can do this... unless you have over about 100M user-group memberships, it's overkill. The non-Hadoop solution is about 10 lines of code in comparison.
On Fri, Feb 18, 2011 at 1:14 PM, Radek Maciaszek <[email protected]> wrote: > Hi Ted, > > Thanks for pointing me into the right direction. I just looked up more > closely on the recommendation wiki and I think I can do something you > proposed. To quote from > this<https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering>page: > "*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob* > computes > all similar items. It expects a .csv file with the preference data as input, > where each line represents a single preference in the form * > userID,itemID,value* and outputs pairs of itemIDs with their associated > similarity value." > > If I will pass the data in format "userId,groupId,1" it should output pairs > of groupIDs with their similarities - or at least I hope so. Sounds easy :) > > Many thanks! > Radek > > On 17 February 2011 17:42, Ted Dunning <[email protected]> wrote: > >> Yes. >> >> Simply transpose your data and then use standard similarity techniques. >> >> Transposition in this case means that you would reformulate your data to be >> >> group1: user ... user >> >> In practice, the standard input form for Mahout recommendations is more >> like >> this: >> >> user group rating >> >> where your ratings will always be 1. Simply redesignation of the two first >> columns suffices to transpose data like this. >> >> On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek >> <[email protected]>wrote: >> >> > I am trying to find a similarities between the groups (not the users). >> Some >> > simple similarity metric (e.g. 0-1, close to 0 for not similar at all, >> > close >> > to 1 very similar) would be ideal. So essentially I need to calculate >> such >> > a >> > metric for every pair of groups. >> > >> > Is it something Mahout can help me with? >> > >> >
