This looks like a simple collaborative filtering problem, or at least can be solved that way. It's not even recommendation, just an item similarity problem.
Users are users and groups are items. You are just computing item-item similarity based on some metric and there are several implemented in the library. Forget Hadoop for now as I doubt this is nearly of the scale where you need it. For a quick solution, make a file of "userID,groupID" entries for every membership. Create a FileDataModel on top of it. Then instantiate LogLikelihoodtemSimilarity on top of that for example. It will score the "simiarlity" between any two groups based on membership. The result is between 0 and 1. On Thu, Feb 17, 2011 at 2:34 PM, Radek Maciaszek <[email protected]> wrote: > Hello, > > I have a following problem and I am trying to figure out if using Mahout is > a good idea for this or perhaps there may be a much simpler approach. > > Consider I have users who can belong to many groups: > user1: group1, group2 > user2: group2 > user3: group2, group3 > ... and millions more > > I am trying to find a similarities between the groups (not the users). Some > simple similarity metric (e.g. 0-1, close to 0 for not similar at all, close > to 1 very similar) would be ideal. So essentially I need to calculate such a > metric for every pair of groups. > > Is it something Mahout can help me with? > > Many thanks, > Radek >
