I'm working on Market Basket Analysis.
The "small" data sets consists of 40000 transactions (baskets) and 35
categories.
While the large data sets is about 30million baskets and 400 categories.


On Mon, May 6, 2013 at 9:17 PM, Ted Dunning <[email protected]> wrote:

> It really depends on your data, but anything that works on text has at
> least a potential for working on categorical data.
>
> It is common to use a 1-of-n encoding for categorical data and then simply
> use Euclidean distance with something like k-means.
>
> Can you say something about how many variables and how many categories the
> variables have?
>
>
> On Mon, May 6, 2013 at 9:49 AM, Florents Tselai
> <[email protected]>wrote:
>
> > Hello,
> >
> > Are there any suggestions on what mahout algorithms (from mahout) to use
> > for clustering categorical data?
> >
>

Reply via email to