Hi Sebastian, Thanks for a tip. I will try to report later how the analysis will go. Hopefully EMR will work fine with all this.
Cheers, Radek On 18 February 2011 10:20, Sebastian Schelter <[email protected]> wrote: > Hi Radek, > > While this a nice and creative way to use ItemSimilarityJob, be aware that > it might be prune away some of your data! So either set the parameter > "maxCooccurrencesPerItem" to a very high number or use RowSimilarityJob > directly. > > --sebastian > > > On 18.02.2011 11:14, Radek Maciaszek wrote: > >> Hi Ted, >> >> Thanks for pointing me into the right direction. I just looked up more >> closely on the recommendation wiki and I think I can do something you >> proposed. To quote from >> this< >> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering >> >page: >> >> "*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob* >> computes >> all similar items. It expects a .csv file with the preference data as >> input, >> where each line represents a single preference in the form * >> userID,itemID,value* and outputs pairs of itemIDs with their associated >> similarity value." >> >> If I will pass the data in format "userId,groupId,1" it should output >> pairs >> of groupIDs with their similarities - or at least I hope so. Sounds easy >> :) >> >> Many thanks! >> Radek >> >> On 17 February 2011 17:42, Ted Dunning<[email protected]> wrote: >> >> Yes. >>> >>> Simply transpose your data and then use standard similarity techniques. >>> >>> Transposition in this case means that you would reformulate your data to >>> be >>> >>> group1: user ... user >>> >>> In practice, the standard input form for Mahout recommendations is more >>> like >>> this: >>> >>> user group rating >>> >>> where your ratings will always be 1. Simply redesignation of the two >>> first >>> columns suffices to transpose data like this. >>> >>> On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek >>> <[email protected]>wrote: >>> >>> I am trying to find a similarities between the groups (not the users). >>>> >>> Some >>> >>>> simple similarity metric (e.g. 0-1, close to 0 for not similar at all, >>>> close >>>> to 1 very similar) would be ideal. So essentially I need to calculate >>>> >>> such >>> >>>> a >>>> metric for every pair of groups. >>>> >>>> Is it something Mahout can help me with? >>>> >>>> >
