What's the format of your input vector sequence file? It should be
<key=Writable; value=VectorWritable> and the key is ignored. From the
exception it looks like your input data might not be right. I'm pretty
sure you aren't running synthetic control, since it just ran for me on
an EC2 cluster:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
On 5/21/10 4:08 PM, Delroy Cameron wrote:
hey Jeff,
1) i'm not sure i discern the changes in your command below. in any case i
copied and pasted it directly and ran it and it also gave the same exception
as previously
2) i listed the contents on hadoop resulting from the clustering. here is my
output. i interrupted the clustering after the first iteration because the
exception occurs upon each iteration..i'm sure there is a way to look at
the vectors to verify that it is not the source of the problem
$ hadoop dfs -ls /user/delroy/
Found 3 items
drwxr-xr-x - delroy delroy 0 2010-05-21 10:04
/user/delroy/clusters
drwxr-xr-x - delroy delroy 0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors
drwxr-xr-x - delroy delroy 0 2010-05-21 07:38
/user/delroy/trecdata-vectors
$ hadoop dfs -ls /user/delroy/trecdata-kmeans-vectors
Found 5 items
-rw-r--r-- 2 delroy delroy 1522195 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/dictionary.file-0
drwxr-xr-x - delroy delroy 0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/tfidf
drwxr-xr-x - delroy delroy 0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/tokenized-documents
drwxr-xr-x - delroy delroy 0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/vectors
drwxr-xr-x - delroy delroy 0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/wordcount
also i ran the command by specifying only the directory containing the
vectors i.e.
$ hadoop jar mahout/core/target/mahout-core-0.4-SNAPSHOT.job
org.apache.mahout.clustering.kmeans.KMeansDriver \
-i trecdata-vectors \
-c clusters \
-o trecdata-kmeans-clusters \
-dm org.apache.mahout.common.distance.CosineDistanceMeasure
-x 20 -cd 0.5 -k 26 -ow -r 8 -cl
and i got the following exception below.
10/05/21 19:02:41 INFO common.HadoopUtil: Deleting clusters
10/05/21 19:02:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
10/05/21 19:02:41 INFO zlib.ZlibFactory: Successfully loaded& initialized
native-zlib library
10/05/21 19:02:41 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.mahout.math.VectorWritable
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:84)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
-----
--cheers
Delroy