Hi Radek,

Looking forward to your report. This job works fine on EMR.

--sebastian

On 18.02.2011 12:12, Radek Maciaszek wrote:
Hi Sebastian,

Thanks for a tip. I will try to report later how the analysis will go. Hopefully EMR will work fine with all this.

Cheers,
Radek

On 18 February 2011 10:20, Sebastian Schelter <[email protected] <mailto:[email protected]>> wrote:

    Hi Radek,

    While this a nice and creative way to use ItemSimilarityJob, be
    aware that it might be prune away some of your data! So either set
    the parameter "maxCooccurrencesPerItem" to a very high number or
    use RowSimilarityJob directly.

    --sebastian


    On 18.02.2011 11:14, Radek Maciaszek wrote:

        Hi Ted,

        Thanks for pointing me into the right direction. I just looked
        up more
        closely on the recommendation wiki and I think I can do
        something you
        proposed. To quote from
        
this<https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering>page:


        "*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob*
        computes
        all similar items. It expects a .csv file with the preference
        data as input,
        where each line represents a single preference in the form *
        userID,itemID,value* and outputs pairs of itemIDs with their
        associated
        similarity value."

        If I will pass the data in format "userId,groupId,1" it should
        output pairs
        of groupIDs with their similarities - or at least I hope so.
        Sounds easy :)

        Many thanks!
        Radek

        On 17 February 2011 17:42, Ted Dunning<[email protected]
        <mailto:[email protected]>>  wrote:

            Yes.

            Simply transpose your data and then use standard
            similarity techniques.

            Transposition in this case means that you would
            reformulate your data to be

            group1: user ... user

            In practice, the standard input form for Mahout
            recommendations is more
            like
            this:

            user group rating

            where your ratings will always be 1.  Simply redesignation
            of the two first
            columns suffices to transpose data like this.

            On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek
            <[email protected]
            <mailto:[email protected]>>wrote:

                I am trying to find a similarities between the groups
                (not the users).

            Some

                simple similarity metric (e.g. 0-1, close to 0 for not
                similar at all,
                close
                to 1 very similar) would be ideal. So essentially I
                need to calculate

            such

                a
                metric for every pair of groups.

                Is it something Mahout can help me with?






Reply via email to