It really depends on your data, but anything that works on text has at least a potential for working on categorical data.
It is common to use a 1-of-n encoding for categorical data and then simply use Euclidean distance with something like k-means. Can you say something about how many variables and how many categories the variables have? On Mon, May 6, 2013 at 9:49 AM, Florents Tselai <[email protected]>wrote: > Hello, > > Are there any suggestions on what mahout algorithms (from mahout) to use > for clustering categorical data? >
