Thank you very much! NamedVector has to solve my problem!
Anyway, I'm always wondering the answer speed in the Hadoop lists!

Thank you,
Gabor

On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> wrote:

> The NamedVector class adds a string to any vector, forwarding all
> methods to the wrapped vector. You can cluster these, and then pull
> the strings. The clustering algorithm operates on the wrapped vector.
>
> Lance
>
> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]>
> wrote:
> > Hi,
> >
> > I'm a little bit confused about Mahout's clustering algorithms. I like to
> > clustering data with id column. How can I do that?
> > For example, I like to run K-Means clustering on the Iris data set (
> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four
> numerical
> > columns. I generated an id column to identify the records and when the
> > clustering is done, I like to see the results.
> > When I examine the code, I realized that I can create DenseVector
> instances
> > (with the four numberical column, without the id) and write those in
> > VectorWriteable format. These were my input data. After I managed to run
> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where
> > keys tell me the clusterID. Is it possible to handle ID attribute
> somehow?
> > Maybe the order of the output data is the same as the input data? Can
> anyone
> > confirm this?
> >
> > Thank you very much,
> > Gabor Makrai
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to