Thank you very much! NamedVector has to solve my problem! Anyway, I'm always wondering the answer speed in the Hadoop lists!
Thank you, Gabor On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> wrote: > The NamedVector class adds a string to any vector, forwarding all > methods to the wrapped vector. You can cluster these, and then pull > the strings. The clustering algorithm operates on the wrapped vector. > > Lance > > On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]> > wrote: > > Hi, > > > > I'm a little bit confused about Mahout's clustering algorithms. I like to > > clustering data with id column. How can I do that? > > For example, I like to run K-Means clustering on the Iris data set ( > > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four > numerical > > columns. I generated an id column to identify the records and when the > > clustering is done, I like to see the results. > > When I examine the code, I realized that I can create DenseVector > instances > > (with the four numberical column, without the id) and write those in > > VectorWriteable format. These were my input data. After I managed to run > > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where > > keys tell me the clusterID. Is it possible to handle ID attribute > somehow? > > Maybe the order of the output data is the same as the input data? Can > anyone > > confirm this? > > > > Thank you very much, > > Gabor Makrai > > > > > > -- > Lance Norskog > [email protected] >
