Re: Clustering with id

Lance Norskog Mon, 11 Jul 2011 20:59:21 -0700

I'm finding it hard to maintain these labels across vector and matrix
factorizations & direct operations.


On Mon, Jul 11, 2011 at 1:10 AM, Gabor Makrai <[email protected]> wrote:
> Thank you very much! NamedVector has to solve my problem!
> Anyway, I'm always wondering the answer speed in the Hadoop lists!
>
> Thank you,
> Gabor
>
> On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> wrote:
>
>> The NamedVector class adds a string to any vector, forwarding all
>> methods to the wrapped vector. You can cluster these, and then pull
>> the strings. The clustering algorithm operates on the wrapped vector.
>>
>> Lance
>>
>> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > I'm a little bit confused about Mahout's clustering algorithms. I like to
>> > clustering data with id column. How can I do that?
>> > For example, I like to run K-Means clustering on the Iris data set (
>> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four
>> numerical
>> > columns. I generated an id column to identify the records and when the
>> > clustering is done, I like to see the results.
>> > When I examine the code, I realized that I can create DenseVector
>> instances
>> > (with the four numberical column, without the id) and write those in
>> > VectorWriteable format. These were my input data. After I managed to run
>> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, where
>> > keys tell me the clusterID. Is it possible to handle ID attribute
>> somehow?
>> > Maybe the order of the output data is the same as the input data? Can
>> anyone
>> > confirm this?
>> >
>> > Thank you very much,
>> > Gabor Makrai
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>



-- 
Lance Norskog
[email protected]

Re: Clustering with id

Reply via email to