Re: Clustering with id

Lance Norskog Mon, 11 Jul 2011 21:46:57 -0700

My algorithm was wrong anyway, and was making things harder for myself
than I needed.


On Mon, Jul 11, 2011 at 9:36 PM, Ted Dunning <[email protected]> wrote:
> Can you give specific examples?  The process should be relatively
> straightforward and the implication that rows have row labels that are
> defined by the left operand of a product and columns have column labels that
> are defined by the right operand should be sufficient.  Sums should have the
> same row and column labels if any.  From these constraints everything else
> should follow.
>
> On Mon, Jul 11, 2011 at 8:59 PM, Lance Norskog <[email protected]> wrote:
>
>> I mean, walking through the algorithms and tracking what vector name
>> becomes what matrix row/column label.
>>
>> On Mon, Jul 11, 2011 at 8:58 PM, Lance Norskog <[email protected]> wrote:
>> > I'm finding it hard to maintain these labels across vector and matrix
>> > factorizations & direct operations.
>> >
>> > On Mon, Jul 11, 2011 at 1:10 AM, Gabor Makrai <[email protected]>
>> wrote:
>> >> Thank you very much! NamedVector has to solve my problem!
>> >> Anyway, I'm always wondering the answer speed in the Hadoop lists!
>> >>
>> >> Thank you,
>> >> Gabor
>> >>
>> >> On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]>
>> wrote:
>> >>
>> >>> The NamedVector class adds a string to any vector, forwarding all
>> >>> methods to the wrapped vector. You can cluster these, and then pull
>> >>> the strings. The clustering algorithm operates on the wrapped vector.
>> >>>
>> >>> Lance
>> >>>
>> >>> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > I'm a little bit confused about Mahout's clustering algorithms. I
>> like to
>> >>> > clustering data with id column. How can I do that?
>> >>> > For example, I like to run K-Means clustering on the Iris data set (
>> >>> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four
>> >>> numerical
>> >>> > columns. I generated an id column to identify the records and when
>> the
>> >>> > clustering is done, I like to see the results.
>> >>> > When I examine the code, I realized that I can create DenseVector
>> >>> instances
>> >>> > (with the four numberical column, without the id) and write those in
>> >>> > VectorWriteable format. These were my input data. After I managed to
>> run
>> >>> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs,
>> where
>> >>> > keys tell me the clusterID. Is it possible to handle ID attribute
>> >>> somehow?
>> >>> > Maybe the order of the output data is the same as the input data? Can
>> >>> anyone
>> >>> > confirm this?
>> >>> >
>> >>> > Thank you very much,
>> >>> > Gabor Makrai
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Lance Norskog
>> >>> [email protected]
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > [email protected]
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>



-- 
Lance Norskog
[email protected]

Re: Clustering with id

Reply via email to