The 'data' folder in output contains the synthetic control dataset converted to 
Mahout Vector format; it is the input to the clustering algorithm which you 
select in the script. After doing the clustering, the script runs ClusterDumper 
on the clusters-5 models and pulls the 'clusteredPoints' from that directory. 
Ldatopics will only work with the LDA outputs; use ClusterDumper with canopy, 
kmeans, fuzzyk, dirichlet and mean shift.

-----Original Message-----    
From: wine lover [mailto:[email protected]] 
Sent: Friday, June 24, 2011 12:24 PM
To: [email protected]
Subject: How to read/analyze the clustered result

Hello Everyone,

I just installed the mahout and hadoop, and began to run the listed
examples.

I followed the example of "clustering of synthetic control data" (
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data#FootnoteMarker3).
I choose to use the dirichlet clustering algorithm. It seems to me that
every procedure works fine and the clustering results have been generated.
The output files are listed as follows:
~/workspaceMahout/mahout/trunk/examples/output% ls
clusteredPoints  clusters-0  clusters-1  clusters-2  clusters-3  clusters-4
clusters-5  data


Currently, I have several questions on how to analyze these data.

1) What does the "data" fold stand for in the output directory?
2) I tried to use ldatopics to obtain the result. For the "input vector
directory", should I set it as
-i ./examples/output/clusters-5
3) What does the input dictionary file mean? During my clustering process, (
$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job), I was not
asked to give any dictonary file.

Thank you very much for the help.

wenyia

Reply via email to