Hi Radek,
Looking forward to your report. This job works fine on EMR.
--sebastian
On 18.02.2011 12:12, Radek Maciaszek wrote:
Hi Sebastian,
Thanks for a tip. I will try to report later how the analysis will go.
Hopefully EMR will work fine with all this.
Cheers,
Radek
On 18 February 2011 10:20, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:
Hi Radek,
While this a nice and creative way to use ItemSimilarityJob, be
aware that it might be prune away some of your data! So either set
the parameter "maxCooccurrencesPerItem" to a very high number or
use RowSimilarityJob directly.
--sebastian
On 18.02.2011 11:14, Radek Maciaszek wrote:
Hi Ted,
Thanks for pointing me into the right direction. I just looked
up more
closely on the recommendation wiki and I think I can do
something you
proposed. To quote from
this<https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering>page:
"*org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob*
computes
all similar items. It expects a .csv file with the preference
data as input,
where each line represents a single preference in the form *
userID,itemID,value* and outputs pairs of itemIDs with their
associated
similarity value."
If I will pass the data in format "userId,groupId,1" it should
output pairs
of groupIDs with their similarities - or at least I hope so.
Sounds easy :)
Many thanks!
Radek
On 17 February 2011 17:42, Ted Dunning<[email protected]
<mailto:[email protected]>> wrote:
Yes.
Simply transpose your data and then use standard
similarity techniques.
Transposition in this case means that you would
reformulate your data to be
group1: user ... user
In practice, the standard input form for Mahout
recommendations is more
like
this:
user group rating
where your ratings will always be 1. Simply redesignation
of the two first
columns suffices to transpose data like this.
On Thu, Feb 17, 2011 at 3:34 AM, Radek Maciaszek
<[email protected]
<mailto:[email protected]>>wrote:
I am trying to find a similarities between the groups
(not the users).
Some
simple similarity metric (e.g. 0-1, close to 0 for not
similar at all,
close
to 1 very similar) would be ideal. So essentially I
need to calculate
such
a
metric for every pair of groups.
Is it something Mahout can help me with?