Hi Sebastian,

Thanks for a tip. I will try to report later how the analysis will go.
Hopefully EMR will work fine with all this.

Cheers,
Radek

On 18 February 2011 10:20, Sebastian Schelter <[email protected]> wrote:

> Hi Radek,
>
> While this a nice and creative way to use ItemSimilarityJob, be aware that
> it might be prune away some of your data! So either set the parameter
> "maxCooccurrencesPerItem" to a very high number or use RowSimilarityJob
> directly.
>
> --sebastian
>
>
> On 18.02.2011 11:14, Radek Maciaszek wrote:
>
>> Hi Ted,
>>
>> Thanks for pointing me into the right direction. I just looked up more
>> closely on the recommendation wiki and I think I can do something you
>> proposed. To quote from
>> this<
>> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
>> >page:
>>
>> "*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob*
>> computes
>> all similar items. It expects a .csv file with the preference data as
>> input,
>> where each line represents a single preference in the form *
>> userID,itemID,value* and outputs pairs of itemIDs with their associated
>> similarity value."
>>
>> If I will pass the data in format "userId,groupId,1" it should output
>> pairs
>> of groupIDs with their similarities - or at least I hope so. Sounds easy
>> :)
>>
>> Many thanks!
>> Radek
>>
>> On 17 February 2011 17:42, Ted Dunning<[email protected]>  wrote:
>>
>>  Yes.
>>>
>>> Simply transpose your data and then use standard similarity techniques.
>>>
>>> Transposition in this case means that you would reformulate your data to
>>> be
>>>
>>> group1: user ... user
>>>
>>> In practice, the standard input form for Mahout recommendations is more
>>> like
>>> this:
>>>
>>> user group rating
>>>
>>> where your ratings will always be 1.  Simply redesignation of the two
>>> first
>>> columns suffices to transpose data like this.
>>>
>>> On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek
>>> <[email protected]>wrote:
>>>
>>>  I am trying to find a similarities between the groups (not the users).
>>>>
>>> Some
>>>
>>>> simple similarity metric (e.g. 0-1, close to 0 for not similar at all,
>>>> close
>>>> to 1 very similar) would be ideal. So essentially I need to calculate
>>>>
>>> such
>>>
>>>> a
>>>> metric for every pair of groups.
>>>>
>>>> Is it something Mahout can help me with?
>>>>
>>>>
>

Reply via email to