Hello Everyone, I just installed the mahout and hadoop, and began to run the listed examples.
I followed the example of "clustering of synthetic control data" ( https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data#FootnoteMarker3). I choose to use the dirichlet clustering algorithm. It seems to me that every procedure works fine and the clustering results have been generated. The output files are listed as follows: ~/workspaceMahout/mahout/trunk/examples/output% ls clusteredPoints clusters-0 clusters-1 clusters-2 clusters-3 clusters-4 clusters-5 data Currently, I have several questions on how to analyze these data. 1) What does the "data" fold stand for in the output directory? 2) I tried to use ldatopics to obtain the result. For the "input vector directory", should I set it as -i ./examples/output/clusters-5 3) What does the input dictionary file mean? During my clustering process, ( $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job), I was not asked to give any dictonary file. Thank you very much for the help. wenyia
