You'll want to add the -nv option to seq2sparse to get NamedVectors out and add 
the -cl argument to k-means to get the clustered documents. Then the 
clusterdump should give you what you are seeking.

-----Original Message-----
From: Yosep Kim [mailto:[email protected]] 
Sent: Thursday, August 11, 2011 3:43 PM
To: [email protected]
Subject: How to convert

Hello, Everyone!

This is Yosep Kim, and I just started playing with Mahout.
 I successfully installed it on my box and got a example data clustered
using a K-Means clustering algorithm.  My input data was all text documents
(i.e. new articles).  I ran a clusterdump command, I get some cool
information.  However, I was not able to find a way to translate this back
to the original document.  It looks like the algorithm created clusters
based on all the words inside of documents.  Did I understand this
correctly?  How can I create clusters based on documents so I can see that
"document1.txt and document2.txt are in Cluster 1"?  I'd appreciate your
help!!  Thanks.


:CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, 0000:0.008,
001:0.015, 00139:0.014, 001
        Top Terms:
                c                                       =>
2.458502088406289
                software                                =>
2.375095306671867
                java                                    =>
 2.2093305677868598
                project                                 =>
1.989917316871096
                application                             =>
1.957329582567363
                using                                   =>
1.916300386652466
                web                                     =>
 1.9046723985856817
                development                             =>
 1.8707247066867443

By the way, Mahout is way cool, and I can't wait to be part of this
"movement".

Yosep

Reply via email to