Hi Sharath,

Just getting back to this -- what is in the  reuters/reuters21578
directory? Are the text files of some sort or are they the
reuters-21578 sgm files from
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz

To answer your original question -- there isn't anything in particular
that should be broken with running on the reuters data at the moment
that I'm aware of, and it appears you're not using the
build-reuters.sh script, where we're currently experiencing problems
in certain contexts.

The ClassCastException of the sort you are encountering indicates that
there's a problem with one of the sequence files found in the input
directory not containing what's expected.

If you haven't tried the following, do:

bin/mahout kmeans -i reuters/reuters-vectors-bigram/tfidf-vectors -c
reuters/reuters-initial-clusters -o reuters/reuters-kmeans-clusters -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -k 20
--maxIter 100

If you have, and it failed - what was the error? What's in the
tfidf-vectors directory?

Drew

On Mon, Jun 20, 2011 at 2:40 PM, sharath jagannath
<[email protected]> wrote:
> Hey,
>
> I was testing the kmeans driver using the reuters data.
>
> Commands used:
>
> 1. bin/mahout seqdirectory -c UTF-8 -i reuters/reuters21578 -o
> reuters/reuters-seqfiles
> 2. bin/mahout seq2sparse -i reuters/reuters-seqfiles/ -o
> reuters/reuters-vectors-bigram -ow -a
> org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tf -s 5 -md 3
> -x 90 -ng 1
> 3. bin/mahout kmeans -i reuters/reuters-vectors-bigram/ -c
> reuters/reuters-initial-clusters -o reuters/reuters-kmeans-clusters -dm
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -k 20
> --maxIter 100
>
> I get the following exception. Am I doing anything wrong?
>
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.mahout.math.VectorWritable
>    at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
>    at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:59)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>
> Thanks,
> Sharath
>

Reply via email to