Why don't you just run the Job file in examples 
(o.a.m.clustering.syntheticcontrol.dirichlet.Job? It has everything you need 
except the data file. Once you have gotten that running you can try to drive 
Dirichlet on your own.
Jeff

-----Original Message-----
From: Keith Thompson [mailto:[email protected]] 
Sent: Wednesday, May 11, 2011 11:59 AM
To: [email protected]
Subject: Dirichlet Clustering

I am trying to run the example at
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data.
Since I am new to both Hadoop and Mahout, my problem is most likely an
inadequate understanding of Hadoop at this point.  I have converted the
input file to a sequence file and am now trying to run the Dirichlet
clustering algorithm.  It seems to want a VectorWritable rather than a
text.  How do I make the necessary adjustments?

k_thomp@linux-8awa:~> trunk/bin/mahout dirichlet -i output/chunk-0 -o output
-x 10 -k 6
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /usr/local/hadoop-0.20.2/src/conf
11/05/10 14:40:01 INFO common.AbstractJob: Command line arguments:
{--alpha=1.0,
--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
--emitMostLikely=true, --endPhase=2147483647, --input=output/chunk-0,
--maxIter=10, --method=mapreduce,
--modelDist=org.apache.mahout.clustering.dirichlet.models.GaussianClusterDistribution,
--modelPrototype=org.apache.mahout.math.RandomAccessSparseVector,
--numClusters=6, --output=output, --startPhase=0, --tempDir=temp,
--threshold=0}
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.readPrototypeSize(DirichletDriver.java:250)
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.main(DirichletDriver.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156

Reply via email to