Re: Clustering with id

Gabor Makrai Mon, 11 Jul 2011 01:19:53 -0700

Thank you very much! NamedVector has to solve my problem!
Anyway, I'm always wondering the answer speed in the Hadoop lists!


Thank you,
Gabor

On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> wrote:

> The NamedVector class adds a string to any vector, forwarding all
> methods to the wrapped vector. You can cluster these, and then pull
> the strings. The clustering algorithm operates on the wrapped vector.
>
> Lance
>
> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]>
> wrote:
> > Hi,
> >
> > I'm a little bit confused about Mahout's clustering algorithms. I like to
> > clustering data with id column. How can I do that?
> > For example, I like to run K-Means clustering on the Iris data set (
> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four
> numerical
> > columns. I generated an id column to identify the records and when the
> > clustering is done, I like to see the results.
> > When I examine the code, I realized that I can create DenseVector
> instances
> > (with the four numberical column, without the id) and write those in
> > VectorWriteable format. These were my input data. After I managed to run
> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where
> > keys tell me the clusterID. Is it possible to handle ID attribute
> somehow?
> > Maybe the order of the output data is the same as the input data? Can
> anyone
> > confirm this?
> >
> > Thank you very much,
> > Gabor Makrai
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Re: Clustering with id

Reply via email to