Hi,

I'm a little bit confused about Mahout's clustering algorithms. I like to
clustering data with id column. How can I do that?
For example, I like to run K-Means clustering on the Iris data set (
http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four numerical
columns. I generated an id column to identify the records and when the
clustering is done, I like to see the results.
When I examine the code, I realized that I can create DenseVector instances
(with the four numberical column, without the id) and write those in
VectorWriteable format. These were my input data. After I managed to run
K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where
keys tell me the clusterID. Is it possible to handle ID attribute somehow?
Maybe the order of the output data is the same as the input data? Can anyone
confirm this?

Thank you very much,
Gabor Makrai

Reply via email to