I'm finding it hard to maintain these labels across vector and matrix factorizations & direct operations.
On Mon, Jul 11, 2011 at 1:10 AM, Gabor Makrai <[email protected]> wrote: > Thank you very much! NamedVector has to solve my problem! > Anyway, I'm always wondering the answer speed in the Hadoop lists! > > Thank you, > Gabor > > On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> wrote: > >> The NamedVector class adds a string to any vector, forwarding all >> methods to the wrapped vector. You can cluster these, and then pull >> the strings. The clustering algorithm operates on the wrapped vector. >> >> Lance >> >> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]> >> wrote: >> > Hi, >> > >> > I'm a little bit confused about Mahout's clustering algorithms. I like to >> > clustering data with id column. How can I do that? >> > For example, I like to run K-Means clustering on the Iris data set ( >> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four >> numerical >> > columns. I generated an id column to identify the records and when the >> > clustering is done, I like to see the results. >> > When I examine the code, I realized that I can create DenseVector >> instances >> > (with the four numberical column, without the id) and write those in >> > VectorWriteable format. These were my input data. After I managed to run >> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where >> > keys tell me the clusterID. Is it possible to handle ID attribute >> somehow? >> > Maybe the order of the output data is the same as the input data? Can >> anyone >> > confirm this? >> > >> > Thank you very much, >> > Gabor Makrai >> > >> >> >> >> -- >> Lance Norskog >> [email protected] >> > -- Lance Norskog [email protected]
