Hi Sharath, Just getting back to this -- what is in the reuters/reuters21578 directory? Are the text files of some sort or are they the reuters-21578 sgm files from http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
To answer your original question -- there isn't anything in particular that should be broken with running on the reuters data at the moment that I'm aware of, and it appears you're not using the build-reuters.sh script, where we're currently experiencing problems in certain contexts. The ClassCastException of the sort you are encountering indicates that there's a problem with one of the sequence files found in the input directory not containing what's expected. If you haven't tried the following, do: bin/mahout kmeans -i reuters/reuters-vectors-bigram/tfidf-vectors -c reuters/reuters-initial-clusters -o reuters/reuters-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -k 20 --maxIter 100 If you have, and it failed - what was the error? What's in the tfidf-vectors directory? Drew On Mon, Jun 20, 2011 at 2:40 PM, sharath jagannath <[email protected]> wrote: > Hey, > > I was testing the kmeans driver using the reuters data. > > Commands used: > > 1. bin/mahout seqdirectory -c UTF-8 -i reuters/reuters21578 -o > reuters/reuters-seqfiles > 2. bin/mahout seq2sparse -i reuters/reuters-seqfiles/ -o > reuters/reuters-vectors-bigram -ow -a > org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tf -s 5 -md 3 > -x 90 -ng 1 > 3. bin/mahout kmeans -i reuters/reuters-vectors-bigram/ -c > reuters/reuters-initial-clusters -o reuters/reuters-kmeans-clusters -dm > org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -k 20 > --maxIter 100 > > I get the following exception. Am I doing anything wrong? > > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.mahout.math.VectorWritable > at > org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > > > Thanks, > Sharath >
