Hello Everyone,

I just installed the mahout and hadoop, and began to run the listed
examples.

I followed the example of "clustering of synthetic control data" (
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data#FootnoteMarker3).
I choose to use the dirichlet clustering algorithm. It seems to me that
every procedure works fine and the clustering results have been generated.
The output files are listed as follows:
~/workspaceMahout/mahout/trunk/examples/output% ls
clusteredPoints  clusters-0  clusters-1  clusters-2  clusters-3  clusters-4
clusters-5  data


Currently, I have several questions on how to analyze these data.

1) What does the "data" fold stand for in the output directory?
2) I tried to use ldatopics to obtain the result. For the "input vector
directory", should I set it as
-i ./examples/output/clusters-5
3) What does the input dictionary file mean? During my clustering process, (
$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job), I was not
asked to give any dictonary file.

Thank you very much for the help.

wenyia

Reply via email to