Hello,

I'm new in the Mahout world and it seems really nice but it's hard to get
easy documentation :(

I'm trying to run some clustering. Let me explain you what I'm trying to
achieve.
I have a DB with columns  : shop_id (string), customer_category (string),
num_of_purchases (integer)
What I want to do is to discover groups of shops which are related because
they have some customers categories in common.

I think the vectors should be :
"shop #1" = (1, 10, 0, 20)
which means that the customers category A has bought 1 thing in the shop,
the customers category B has bought 10 things in the shop and so...

In my BD for this example I have :
shop_id    | customer_category | num_of_purchases
--------------+-----------------------------+---------------------
"shop #1" |           "A"              |          1
"shop #1" |           "B"              |          10
"shop #1" |           "D"              |          20


I think I must convert this to an ARFF file like :

@RELATION purchases
@ATTRIBUTE shop_id STRING
@ATTRIBUTE catA NUMERIC
@ATTRIBUTE catB NUMERIC
@ATTRIBUTE catC NUMERIC
@ATTRIBUTE catD NUMERIC

@DATA
"shop #1",1,10,0,20
...

Why ARFF file ? Because I can use the helpful sparse syntax.
But it's difficult to build this file. I think I should write a script.


My question is, am I heading in the good direction ?
I would appreciate some help ! Thanks :)

Regards,

-- 
*Clément **Notin*

Reply via email to