I am suspecting that the sequenceFile is not written properly, because the command line cluster dumper would not return points in the dump file :/ is there any way to verify this?
Best wishes, Jeffrey04 >________________________________ >From: Jeffrey <[email protected]> >To: "[email protected]" <[email protected]> >Sent: Tuesday, September 6, 2011 6:02 PM >Subject: Re: Clustering (fkmeans) with Mahout using Clojure > > >When I switch the runSequential option to true, as shown in the following >script > > > #!./bin/clj > > > (ns sensei.clustering.fkmeans) > > > (import org.apache.hadoop.conf.Configuration) > (import org.apache.hadoop.fs.Path) > > > (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver) > (import org.apache.mahout.common.distance.EuclideanDistanceMeasure) > (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator) > > > (let [hadoop_configuration ((fn [] > (let [conf (new Configuration)] > (. conf set "fs.default.name" >"hdfs://localhost:9000/") > conf))) > input_path (new Path "test/sensei") > output_path (new Path "test/clusters") > clusters_in_path (new Path "test/clusters/cluster-0")] > (FuzzyKMeansDriver/run > hadoop_configuration > input_path > (RandomSeedGenerator/buildRandom > hadoop_configuration > input_path > clusters_in_path > (int 2) > (new EuclideanDistanceMeasure)) > output_path > (new EuclideanDistanceMeasure) > (double 0.5) > (int 10) > (float 5.0) > true > false > (double 0.0) > true)) > > >I get this output > > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further >details. > 11/09/06 17:58:59 WARN util.NativeCodeLoader: Unable to load native-hadoop >library for your platform... using builtin-java classes where applicable > 11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new compressor > 11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: Clusters is >empty! > at >org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersSeq(FuzzyKMeansDriver.java:361) > at >org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:343) > at >org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295) > at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35) > at clojure.lang.Compiler.eval(Compiler.java:6406) > at clojure.lang.Compiler.load(Compiler.java:6843) > at clojure.lang.Compiler.loadFile(Compiler.java:6804) > at clojure.main$load_script.invoke(main.clj:282) > at clojure.main$script_opt.invoke(main.clj:342) > at clojure.main$main.doInvoke(main.clj:426) > at clojure.lang.RestFn.invoke(RestFn.java:436) > at clojure.lang.Var.invoke(Var.java:409) > at clojure.lang.AFn.applyToHelper(AFn.java:167) > at clojure.lang.Var.applyTo(Var.java:518) > at clojure.main.main(main.java:37) > > >Am I generating the initial cluster wrong? > > >Rewritten the script to use FuzzyKMeansDriver.run(String[] args) but still >fails with the same error as the original program (the output is the same as >the initial output, it's kinda long so I am not pasting it again here). > > > #!./bin/clj > > > (ns sensei.clustering.fkmeans) > > > (import org.apache.hadoop.conf.Configuration) > (import org.apache.hadoop.fs.Path) > > > (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver) > (import org.apache.mahout.common.distance.EuclideanDistanceMeasure) > (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator) > > > (let [hadoop_configuration ((fn [] > (let [conf (new Configuration)] > (. conf set "fs.default.name" >"hdfs://localhost:9000/") > conf))) > driver (new FuzzyKMeansDriver)] > (. driver setConf hadoop_configuration) > (. driver > run > (into-array String ["--input" "test/sensei" > "--output" "test/clusters" > "--clusters" "test/clusters/clusters-0" > "--clustering" > "--overwrite" > "--emitMostLikely" "false" > "--numClusters" "3" > "--maxIter" "10" > "--m" "5"]))) > > >Best wishes, >Jeffrey04 > >>________________________________ >>From: Jeffrey <[email protected]> >>To: Jake Mannix <[email protected]>; "[email protected]" >><[email protected]> >>Sent: Tuesday, September 6, 2011 3:53 PM >>Subject: Re: Clustering (fkmeans) with Mahout using Clojure >> >> >>Hi, >> >> >>Took a break from this task and moved on with some other tasks in list. When >>I re-visit this task again this morning I found some problem with sort >>utility and LC_COLLATE environment variable that would make my sequenceFile >>generation script fail. Now I managed to get the command line utility to >>generate the clusters >> >> >> $ bin/mahout fkmeans --input test/sensei --output test/clusters >>--clusters test/clusters/clusters-0 --clustering --overwrite --emitMostLikely >>false --numClusters 3 --maxIter 10 --m 5 >> >> >>However, when I run cluster dumper, I only see the three cluster center >>points, but not the points although I included --clustering and >>--emitMostLikely options when I do the clustering >> >> >> $ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 >>--pointsDir test/clusters/clusteredPoints --output sensei.txt >> >> >> >>tested this with the latest revision of mahout-0.6-snapshot >> >> >>When I try to do clustering with my clojure code (same as the one posted >>before), it is still giving me the same error, any idea? >> >> >>Regards, >>Jeffrey04 >> >> >> >>>________________________________ >>>From: Jake Mannix <[email protected]> >>>To: [email protected]; Jeffrey <[email protected]> >>>Sent: Friday, August 26, 2011 1:23 AM >>>Subject: Re: Clustering (fkmeans) with Mahout using Clojure >>> >>> >>> >>> >>> >>>On Thu, Aug 25, 2011 at 10:11 AM, Jeffrey <[email protected]> wrote: >>> >>>I am trying to write a short script to cluster my data via clojure (calling >>>Mahout classes though). I have my input data in this format (which is an >>>output from a >>>> >>>> >>> >>> >>>This line you're instantiating a new SequentialAccessSparseVector, with the >>>value of cardinality being "count (vals photo_list)" - you need to have all >>>of your Vectors exist with the same cardinality (ie. they live in the same >>>vector space, mathematically). So you need to figure out how big they need >>>to be, and instantiate them *all* with this cardinality. >>> >>> (new SequentialAccessSparseVector >>>(count (vals photo_list))) >>>> >>> >>> >>> >>> >>>The error you are getting below: >>> >>> >>>EDIT: apparently cardinality needs to be 1, need to figure out how to do it >>>> >>> >>> >>>is actually telling you that you're trying to say all vectors should be >>>cardinality 1, but it found some vectors with cardinality 10, so it threw an >>>exception. >>> >>> -jake >>> >>> >> >> > >
