Pretty much.

Or you can use a standard distance measure and just encode your data
cleverly.

For instance, for color, I would recommend 1 of n encoding where your n
colors is converted to n values, only one of which is non-zero.

2011/8/18 Clément Notin <[email protected]>

> Thank you !
>
> I understand a lot better now.
>
> For the clustering I should write my own distance measure class. Not try to
> give numerical values to colors.
>
> 2011/8/18 Ted Dunning <[email protected]>
>
> > Just the opposite.  Frequent itemset would discover groups of tv channels
> > and colors that occur together.  That might be slightly interesting, but
> > probably not so useful.
> >
> > For that you want clustering, but you will have to decide how similar
> > colors
> > are.  You might just say that if they are the same, distance is 0 while
> > different means distance 1.
> >
> > Or you could do SVD first and then cluster (that is spectral clustering,
> > ish).
> >
> > 2011/8/18 Clément Notin <[email protected]>
> >
> > > Ok thanks !
> > >
> > > So if I want to discover groups of customers based on, for example,
> their
> > > favorite color, their favorite TV channel and the brand of their
> cellular
> > > phone (it's an example...) should I use frequent itemset mining instead
> > of
> > > clustering ?
> > >
> > > 2011/8/17 Ted Dunning <[email protected]>
> > >
> > > > Both clustering and frequent itemset algorithms are unsupervised
> > learning
> > > > methods.
> > > >
> > > > Clustering uses your definition of near and far to find (hopefully)
> > > clumps
> > > > of data.
> > > >
> > > > Frequent item-set analysis looks for cases where items cooccur.  The
> > > origin
> > > > is in what is called market-basket analysis where the goal was to
> find
> > > > items
> > > > that are commonly purchased together.
> > > >
> > > > For most purposes, I recommend simple cooccurrence analysis.
> > > >
> > > > I think that your confusion stems from you telling the frequent
> itemset
> > > > code
> > > > to find item characteristics that often occur together on the same
> > item.
> > > >  That probably isn't what you want.
> > > >
> > > > 2011/8/17 Clément Notin <[email protected]>
> > > >
> > > > > Hello Mahout !
> > > > >
> > > > > I'm unable to find the answer (trust me, I tried !) of a simple
> > > question
> > > > :
> > > > > what is the difference between clustering and frequent itemset
> mining
> > ?
> > > > >
> > > > > I think that frequent itemset mining could help me to cluster
> things
> > > > based
> > > > > on colors or other non-numerical characteristics. I thought about
> > > > > converting
> > > > > these values to numbers but it don't always make sense (what order
> > > should
> > > > I
> > > > > use ? blue is near purple ok so blue = 1 and purple = 2 but is
> these
> > > car,
> > > > > for example, near that one ?).
> > > > >
> > > > > Thanks for reading.
> > > > > Regards,
> > > > >
> > > > > --
> > > > > *Clément **Notin*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Clément **Notin*
> > >
> >
>
>
>
> --
> *Clément **Notin*
>

Reply via email to