It really depends on your data, but anything that works on text has at
least a potential for working on categorical data.

It is common to use a 1-of-n encoding for categorical data and then simply
use Euclidean distance with something like k-means.

Can you say something about how many variables and how many categories the
variables have?


On Mon, May 6, 2013 at 9:49 AM, Florents Tselai
<[email protected]>wrote:

> Hello,
>
> Are there any suggestions on what mahout algorithms (from mahout) to use
> for clustering categorical data?
>

Reply via email to