Pretty much. Or you can use a standard distance measure and just encode your data cleverly.
For instance, for color, I would recommend 1 of n encoding where your n colors is converted to n values, only one of which is non-zero. 2011/8/18 Clément Notin <[email protected]> > Thank you ! > > I understand a lot better now. > > For the clustering I should write my own distance measure class. Not try to > give numerical values to colors. > > 2011/8/18 Ted Dunning <[email protected]> > > > Just the opposite. Frequent itemset would discover groups of tv channels > > and colors that occur together. That might be slightly interesting, but > > probably not so useful. > > > > For that you want clustering, but you will have to decide how similar > > colors > > are. You might just say that if they are the same, distance is 0 while > > different means distance 1. > > > > Or you could do SVD first and then cluster (that is spectral clustering, > > ish). > > > > 2011/8/18 Clément Notin <[email protected]> > > > > > Ok thanks ! > > > > > > So if I want to discover groups of customers based on, for example, > their > > > favorite color, their favorite TV channel and the brand of their > cellular > > > phone (it's an example...) should I use frequent itemset mining instead > > of > > > clustering ? > > > > > > 2011/8/17 Ted Dunning <[email protected]> > > > > > > > Both clustering and frequent itemset algorithms are unsupervised > > learning > > > > methods. > > > > > > > > Clustering uses your definition of near and far to find (hopefully) > > > clumps > > > > of data. > > > > > > > > Frequent item-set analysis looks for cases where items cooccur. The > > > origin > > > > is in what is called market-basket analysis where the goal was to > find > > > > items > > > > that are commonly purchased together. > > > > > > > > For most purposes, I recommend simple cooccurrence analysis. > > > > > > > > I think that your confusion stems from you telling the frequent > itemset > > > > code > > > > to find item characteristics that often occur together on the same > > item. > > > > That probably isn't what you want. > > > > > > > > 2011/8/17 Clément Notin <[email protected]> > > > > > > > > > Hello Mahout ! > > > > > > > > > > I'm unable to find the answer (trust me, I tried !) of a simple > > > question > > > > : > > > > > what is the difference between clustering and frequent itemset > mining > > ? > > > > > > > > > > I think that frequent itemset mining could help me to cluster > things > > > > based > > > > > on colors or other non-numerical characteristics. I thought about > > > > > converting > > > > > these values to numbers but it don't always make sense (what order > > > should > > > > I > > > > > use ? blue is near purple ok so blue = 1 and purple = 2 but is > these > > > car, > > > > > for example, near that one ?). > > > > > > > > > > Thanks for reading. > > > > > Regards, > > > > > > > > > > -- > > > > > *Clément **Notin* > > > > > > > > > > > > > > > > > > > > > -- > > > *Clément **Notin* > > > > > > > > > -- > *Clément **Notin* >
