You can do this... unless you have over about 100M user-group
memberships, it's overkill. The non-Hadoop solution is about 10 lines
of code in comparison.

On Fri, Feb 18, 2011 at 1:14 PM, Radek Maciaszek <[email protected]> wrote:
> Hi Ted,
>
> Thanks for pointing me into the right direction. I just looked up more
> closely on the recommendation wiki and I think I can do something you
> proposed. To quote from
> this<https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering>page:
> "*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob* 
> computes
> all similar items. It expects a .csv file with the preference data as input,
> where each line represents a single preference in the form *
> userID,itemID,value* and outputs pairs of itemIDs with their associated
> similarity value."
>
> If I will pass the data in format "userId,groupId,1" it should output pairs
> of groupIDs with their similarities - or at least I hope so. Sounds easy :)
>
> Many thanks!
> Radek
>
> On 17 February 2011 17:42, Ted Dunning <[email protected]> wrote:
>
>> Yes.
>>
>> Simply transpose your data and then use standard similarity techniques.
>>
>> Transposition in this case means that you would reformulate your data to be
>>
>> group1: user ... user
>>
>> In practice, the standard input form for Mahout recommendations is more
>> like
>> this:
>>
>> user group rating
>>
>> where your ratings will always be 1.  Simply redesignation of the two first
>> columns suffices to transpose data like this.
>>
>> On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek
>> <[email protected]>wrote:
>>
>> > I am trying to find a similarities between the groups (not the users).
>> Some
>> > simple similarity metric (e.g. 0-1, close to 0 for not similar at all,
>> > close
>> > to 1 very similar) would be ideal. So essentially I need to calculate
>> such
>> > a
>> > metric for every pair of groups.
>> >
>> > Is it something Mahout can help me with?
>> >
>>
>

Reply via email to