Hey,
I was testing the kmeans driver using the reuters data.
Commands used:
1. bin/mahout seqdirectory -c UTF-8 -i reuters/reuters21578 -o
reuters/reuters-seqfiles
2. bin/mahout seq2sparse -i reuters/reuters-seqfiles/ -o
reuters/reuters-vectors-bigram -ow -a
org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tf -s 5 -md 3
-x 90 -ng 1
3. bin/mahout kmeans -i reuters/reuters-vectors-bigram/ -c
reuters/reuters-initial-clusters -o reuters/reuters-kmeans-clusters -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -k 20
--maxIter 100
I get the following exception. Am I doing anything wrong?
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.mahout.math.VectorWritable
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
Thanks,
Sharath