Why don't you just run the Job file in examples (o.a.m.clustering.syntheticcontrol.dirichlet.Job? It has everything you need except the data file. Once you have gotten that running you can try to drive Dirichlet on your own. Jeff
-----Original Message----- From: Keith Thompson [mailto:[email protected]] Sent: Wednesday, May 11, 2011 11:59 AM To: [email protected] Subject: Dirichlet Clustering I am trying to run the example at https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data. Since I am new to both Hadoop and Mahout, my problem is most likely an inadequate understanding of Hadoop at this point. I have converted the input file to a sequence file and am now trying to run the Dirichlet clustering algorithm. It seems to want a VectorWritable rather than a text. How do I make the necessary adjustments? k_thomp@linux-8awa:~> trunk/bin/mahout dirichlet -i output/chunk-0 -o output -x 10 -k 6 Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.2 No HADOOP_CONF_DIR set, using /usr/local/hadoop-0.20.2/src/conf 11/05/10 14:40:01 INFO common.AbstractJob: Command line arguments: {--alpha=1.0, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --emitMostLikely=true, --endPhase=2147483647, --input=output/chunk-0, --maxIter=10, --method=mapreduce, --modelDist=org.apache.mahout.clustering.dirichlet.models.GaussianClusterDistribution, --modelPrototype=org.apache.mahout.math.RandomAccessSparseVector, --numClusters=6, --output=output, --startPhase=0, --tempDir=temp, --threshold=0} Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.math.VectorWritable at org.apache.mahout.clustering.dirichlet.DirichletDriver.readPrototypeSize(DirichletDriver.java:250) at org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:112) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.dirichlet.DirichletDriver.main(DirichletDriver.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156
