Hello, Everyone!
This is Yosep Kim, and I just started playing with Mahout.
I successfully installed it on my box and got a example data clustered
using a K-Means clustering algorithm. My input data was all text documents
(i.e. new articles). I ran a clusterdump command, I get some cool
information. However, I was not able to find a way to translate this back
to the original document. It looks like the algorithm created clusters
based on all the words inside of documents. Did I understand this
correctly? How can I create clusters based on documents so I can see that
"document1.txt and document2.txt are in Cluster 1"? I'd appreciate your
help!! Thanks.
:CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, 0000:0.008,
001:0.015, 00139:0.014, 001
Top Terms:
c =>
2.458502088406289
software =>
2.375095306671867
java =>
2.2093305677868598
project =>
1.989917316871096
application =>
1.957329582567363
using =>
1.916300386652466
web =>
1.9046723985856817
development =>
1.8707247066867443
By the way, Mahout is way cool, and I can't wait to be part of this
"movement".
Yosep