When I switch the runSequential option to true, as shown in the following script
#!./bin/clj (ns sensei.clustering.fkmeans) (import org.apache.hadoop.conf.Configuration) (import org.apache.hadoop.fs.Path) (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver) (import org.apache.mahout.common.distance.EuclideanDistanceMeasure) (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator) (let [hadoop_configuration ((fn [] (let [conf (new Configuration)] (. conf set "fs.default.name" "hdfs://localhost:9000/") conf))) input_path (new Path "test/sensei") output_path (new Path "test/clusters") clusters_in_path (new Path "test/clusters/cluster-0")] (FuzzyKMeansDriver/run hadoop_configuration input_path (RandomSeedGenerator/buildRandom hadoop_configuration input_path clusters_in_path (int 2) (new EuclideanDistanceMeasure)) output_path (new EuclideanDistanceMeasure) (double 0.5) (int 10) (float 5.0) true false (double 0.0) true)) I get this output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 11/09/06 17:58:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new compressor 11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: Clusters is empty! at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersSeq(FuzzyKMeansDriver.java:361) at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:343) at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295) at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35) at clojure.lang.Compiler.eval(Compiler.java:6406) at clojure.lang.Compiler.load(Compiler.java:6843) at clojure.lang.Compiler.loadFile(Compiler.java:6804) at clojure.main$load_script.invoke(main.clj:282) at clojure.main$script_opt.invoke(main.clj:342) at clojure.main$main.doInvoke(main.clj:426) at clojure.lang.RestFn.invoke(RestFn.java:436) at clojure.lang.Var.invoke(Var.java:409) at clojure.lang.AFn.applyToHelper(AFn.java:167) at clojure.lang.Var.applyTo(Var.java:518) at clojure.main.main(main.java:37) Am I generating the initial cluster wrong? Rewritten the script to use FuzzyKMeansDriver.run(String[] args) but still fails with the same error as the original program (the output is the same as the initial output, it's kinda long so I am not pasting it again here). #!./bin/clj (ns sensei.clustering.fkmeans) (import org.apache.hadoop.conf.Configuration) (import org.apache.hadoop.fs.Path) (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver) (import org.apache.mahout.common.distance.EuclideanDistanceMeasure) (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator) (let [hadoop_configuration ((fn [] (let [conf (new Configuration)] (. conf set "fs.default.name" "hdfs://localhost:9000/") conf))) driver (new FuzzyKMeansDriver)] (. driver setConf hadoop_configuration) (. driver run (into-array String ["--input" "test/sensei" "--output" "test/clusters" "--clusters" "test/clusters/clusters-0" "--clustering" "--overwrite" "--emitMostLikely" "false" "--numClusters" "3" "--maxIter" "10" "--m" "5"]))) Best wishes, Jeffrey04 >________________________________ >From: Jeffrey <[email protected]> >To: Jake Mannix <[email protected]>; "[email protected]" ><[email protected]> >Sent: Tuesday, September 6, 2011 3:53 PM >Subject: Re: Clustering (fkmeans) with Mahout using Clojure > > >Hi, > > >Took a break from this task and moved on with some other tasks in list. When I >re-visit this task again this morning I found some problem with sort utility >and LC_COLLATE environment variable that would make my sequenceFile generation >script fail. Now I managed to get the command line utility to generate the >clusters > > > $ bin/mahout fkmeans --input test/sensei --output test/clusters --clusters >test/clusters/clusters-0 --clustering --overwrite --emitMostLikely false >--numClusters 3 --maxIter 10 --m 5 > > >However, when I run cluster dumper, I only see the three cluster center >points, but not the points although I included --clustering and >--emitMostLikely options when I do the clustering > > > $ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 >--pointsDir test/clusters/clusteredPoints --output sensei.txt > > > >tested this with the latest revision of mahout-0.6-snapshot > > >When I try to do clustering with my clojure code (same as the one posted >before), it is still giving me the same error, any idea? > > >Regards, >Jeffrey04 > > > >>________________________________ >>From: Jake Mannix <[email protected]> >>To: [email protected]; Jeffrey <[email protected]> >>Sent: Friday, August 26, 2011 1:23 AM >>Subject: Re: Clustering (fkmeans) with Mahout using Clojure >> >> >> >> >> >>On Thu, Aug 25, 2011 at 10:11 AM, Jeffrey <[email protected]> wrote: >> >>I am trying to write a short script to cluster my data via clojure (calling >>Mahout classes though). I have my input data in this format (which is an >>output from a >>> >>> >> >> >>This line you're instantiating a new SequentialAccessSparseVector, with the >>value of cardinality being "count (vals photo_list)" - you need to have all >>of your Vectors exist with the same cardinality (ie. they live in the same >>vector space, mathematically). So you need to figure out how big they need >>to be, and instantiate them *all* with this cardinality. >> >> (new SequentialAccessSparseVector >>(count (vals photo_list))) >>> >> >> >> >> >>The error you are getting below: >> >> >>EDIT: apparently cardinality needs to be 1, need to figure out how to do it >>> >> >> >>is actually telling you that you're trying to say all vectors should be >>cardinality 1, but it found some vectors with cardinality 10, so it threw an >>exception. >> >> -jake >> >> > >
