Hello, I'm new in the Mahout world and it seems really nice but it's hard to get easy documentation :(
I'm trying to run some clustering. Let me explain you what I'm trying to achieve. I have a DB with columns : shop_id (string), customer_category (string), num_of_purchases (integer) What I want to do is to discover groups of shops which are related because they have some customers categories in common. I think the vectors should be : "shop #1" = (1, 10, 0, 20) which means that the customers category A has bought 1 thing in the shop, the customers category B has bought 10 things in the shop and so... In my BD for this example I have : shop_id | customer_category | num_of_purchases --------------+-----------------------------+--------------------- "shop #1" | "A" | 1 "shop #1" | "B" | 10 "shop #1" | "D" | 20 I think I must convert this to an ARFF file like : @RELATION purchases @ATTRIBUTE shop_id STRING @ATTRIBUTE catA NUMERIC @ATTRIBUTE catB NUMERIC @ATTRIBUTE catC NUMERIC @ATTRIBUTE catD NUMERIC @DATA "shop #1",1,10,0,20 ... Why ARFF file ? Because I can use the helpful sparse syntax. But it's difficult to build this file. I think I should write a script. My question is, am I heading in the good direction ? I would appreciate some help ! Thanks :) Regards, -- *Clément **Notin*
