My algorithm was wrong anyway, and was making things harder for myself than I needed.
On Mon, Jul 11, 2011 at 9:36 PM, Ted Dunning <[email protected]> wrote: > Can you give specific examples? The process should be relatively > straightforward and the implication that rows have row labels that are > defined by the left operand of a product and columns have column labels that > are defined by the right operand should be sufficient. Sums should have the > same row and column labels if any. From these constraints everything else > should follow. > > On Mon, Jul 11, 2011 at 8:59 PM, Lance Norskog <[email protected]> wrote: > >> I mean, walking through the algorithms and tracking what vector name >> becomes what matrix row/column label. >> >> On Mon, Jul 11, 2011 at 8:58 PM, Lance Norskog <[email protected]> wrote: >> > I'm finding it hard to maintain these labels across vector and matrix >> > factorizations & direct operations. >> > >> > On Mon, Jul 11, 2011 at 1:10 AM, Gabor Makrai <[email protected]> >> wrote: >> >> Thank you very much! NamedVector has to solve my problem! >> >> Anyway, I'm always wondering the answer speed in the Hadoop lists! >> >> >> >> Thank you, >> >> Gabor >> >> >> >> On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog <[email protected]> >> wrote: >> >> >> >>> The NamedVector class adds a string to any vector, forwarding all >> >>> methods to the wrapped vector. You can cluster these, and then pull >> >>> the strings. The clustering algorithm operates on the wrapped vector. >> >>> >> >>> Lance >> >>> >> >>> On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai <[email protected]> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I'm a little bit confused about Mahout's clustering algorithms. I >> like to >> >>> > clustering data with id column. How can I do that? >> >>> > For example, I like to run K-Means clustering on the Iris data set ( >> >>> > http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four >> >>> numerical >> >>> > columns. I generated an id column to identify the records and when >> the >> >>> > clustering is done, I like to see the results. >> >>> > When I examine the code, I realized that I can create DenseVector >> >>> instances >> >>> > (with the four numberical column, without the id) and write those in >> >>> > VectorWriteable format. These were my input data. After I managed to >> run >> >>> > K-Means, I get IntWritable/WeightedVectorWritable key/value pairs, >> where >> >>> > keys tell me the clusterID. Is it possible to handle ID attribute >> >>> somehow? >> >>> > Maybe the order of the output data is the same as the input data? Can >> >>> anyone >> >>> > confirm this? >> >>> > >> >>> > Thank you very much, >> >>> > Gabor Makrai >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Lance Norskog >> >>> [email protected] >> >>> >> >> >> > >> > >> > >> > -- >> > Lance Norskog >> > [email protected] >> > >> >> >> >> -- >> Lance Norskog >> [email protected] >> > -- Lance Norskog [email protected]
