I got a slightly different error on the next line of KMeansDriver.java (running on OS X Snow Leopard)
11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.mahout.math.VectorWritable at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102) On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <[email protected]> wrote: > IIRC, Reuters used to run on a cluster but no longer does due to some > obscure Lucene changes. In 0.5 it only works in local mode. I really hope > this can be repaired by 0.6 as Reuters is a key entry point into Mahout > clustering for many users. > > -----Original Message----- > From: Sean Owen [mailto:[email protected]] > Sent: Sunday, June 05, 2011 11:56 AM > To: [email protected] > Subject: Re: Problems running examples > > This all sounds a load like things that were fixed a little while ago. Are > you on version 0.5, or better yet, SVN HEAD? > > The rest, I don't know, would have to defer to the author of that bit. > > On Sun, Jun 5, 2011 at 7:07 PM, Mark <[email protected]> wrote: > > > Hi all. I'm trying to run the examples/bin/build-reuters.sh but I > continue > > to run into the following exception. > > > > INFO: Deleting mahout-work/reuters-kmeans-clusters > > Jun 5, 2011 10:29:37 AM org.apache.hadoop.util.NativeCodeLoader <clinit> > > WARNING: Unable to load native-hadoop library for your platform... using > > builtin-java classes where applicable > > Jun 5, 2011 10:29:37 AM org.apache.hadoop.io.compress.CodecPool > > getCompressor > > INFO: Got brand-new compressor > > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > > Size: 0 > > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > > at java.util.ArrayList.get(ArrayList.java:322) > > at > > > org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:108) > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > > > > I am also confused reading the build-reuters.sh code itself. There seems > to > > be some disjunction between what is expected to be local and what should > be > > on HDFS. For example on the comments on 77-79 are: > > > > # we know reuters-out-seqdir exists on a local disk at > > # this point, if we're running in clustered mode, > > # copy it up to hdfs > > > > However upon inspection you'll notice that the reueters-out-seqdir is > > actually on HDFS. It seems like the seqdirectory will never write to > local > > disk... even with the MAHOUT_LOCAL=true flag set. > > > > Any ideas? > > > > Thanks > > > -- Yee Yang Li Hector http://hectorgon.blogspot.com/ (tech + travel) http://hectorgon.com (book reviews)
