I am suspecting that the sequenceFile is not written properly, because the 
command line cluster dumper would not return points in the dump file :/ is 
there any way to verify this?

Best wishes,
Jeffrey04



>________________________________
>From: Jeffrey <[email protected]>
>To: "[email protected]" <[email protected]>
>Sent: Tuesday, September 6, 2011 6:02 PM
>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>
>
>When I switch the runSequential option to true, as shown in the following 
>script
>
>
>    #!./bin/clj
>
>
>    (ns sensei.clustering.fkmeans)
>
>
>    (import org.apache.hadoop.conf.Configuration)
>    (import org.apache.hadoop.fs.Path)
>
>
>    (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
>    (import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
>    (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
>
>
>    (let [hadoop_configuration ((fn []
>                                    (let [conf (new Configuration)]
>                                      (. conf set "fs.default.name" 
>"hdfs://localhost:9000/")
>                                      conf)))
>          input_path (new Path "test/sensei")
>          output_path (new Path "test/clusters")
>          clusters_in_path (new Path "test/clusters/cluster-0")]
>      (FuzzyKMeansDriver/run
>        hadoop_configuration
>        input_path
>        (RandomSeedGenerator/buildRandom
>          hadoop_configuration
>          input_path
>          clusters_in_path
>          (int 2)
>          (new EuclideanDistanceMeasure))
>        output_path
>        (new EuclideanDistanceMeasure)
>        (double 0.5)
>        (int 10)
>        (float 5.0)
>        true
>        false
>        (double 0.0)
>        true))
>
>
>I get this output
>
>
>    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>    SLF4J: Defaulting to no-operation (NOP) logger implementation
>    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
>details.
>    11/09/06 17:58:59 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>library for your platform... using builtin-java classes where applicable
>    11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new compressor
>    11/09/06 17:58:59 INFO compress.CodecPool: Got brand-new decompressor
>    Exception in thread "main" java.lang.IllegalStateException: Clusters is 
>empty!
>            at 
>org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersSeq(FuzzyKMeansDriver.java:361)
>            at 
>org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:343)
>            at 
>org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295)
>            at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35)
>            at clojure.lang.Compiler.eval(Compiler.java:6406)
>            at clojure.lang.Compiler.load(Compiler.java:6843)
>            at clojure.lang.Compiler.loadFile(Compiler.java:6804)
>            at clojure.main$load_script.invoke(main.clj:282)
>            at clojure.main$script_opt.invoke(main.clj:342)
>            at clojure.main$main.doInvoke(main.clj:426)
>            at clojure.lang.RestFn.invoke(RestFn.java:436)
>            at clojure.lang.Var.invoke(Var.java:409)
>            at clojure.lang.AFn.applyToHelper(AFn.java:167)
>            at clojure.lang.Var.applyTo(Var.java:518)
>            at clojure.main.main(main.java:37)
>
>
>Am I generating the initial cluster wrong?
>
>
>Rewritten the script to use FuzzyKMeansDriver.run(String[] args) but still 
>fails with the same error as the original program (the output is the same as 
>the initial output, it's kinda long so I am not pasting it again here).
>
>
>    #!./bin/clj
>
>
>    (ns sensei.clustering.fkmeans)
>
>
>    (import org.apache.hadoop.conf.Configuration)
>    (import org.apache.hadoop.fs.Path)
>
>
>    (import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
>    (import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
>    (import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
>
>
>    (let [hadoop_configuration ((fn []
>                                    (let [conf (new Configuration)]
>                                      (. conf set "fs.default.name" 
>"hdfs://localhost:9000/")
>                                      conf)))
>          driver (new FuzzyKMeansDriver)]
>      (. driver setConf hadoop_configuration)
>      (. driver
>         run
>         (into-array String ["--input" "test/sensei"
>                             "--output" "test/clusters"
>                             "--clusters" "test/clusters/clusters-0"
>                             "--clustering"
>                             "--overwrite"
>                             "--emitMostLikely" "false"
>                             "--numClusters" "3"
>                             "--maxIter" "10"
>                             "--m" "5"])))
>
>
>Best wishes,
>Jeffrey04
>
>>________________________________
>>From: Jeffrey <[email protected]>
>>To: Jake Mannix <[email protected]>; "[email protected]" 
>><[email protected]>
>>Sent: Tuesday, September 6, 2011 3:53 PM
>>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>>
>>
>>Hi,
>>
>>
>>Took a break from this task and moved on with some other tasks in list. When 
>>I re-visit this task again this morning I found some problem with sort 
>>utility and LC_COLLATE environment variable that would make my sequenceFile 
>>generation script fail. Now I managed to get the command line utility to 
>>generate the clusters
>>
>>
>>    $ bin/mahout fkmeans --input test/sensei --output test/clusters 
>>--clusters test/clusters/clusters-0 --clustering --overwrite --emitMostLikely 
>>false --numClusters 3 --maxIter 10 --m 5
>>
>>
>>However, when I run cluster dumper, I only see the three cluster center 
>>points, but not the points although I included --clustering and 
>>--emitMostLikely options when I do the clustering
>>
>>
>>    $ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 
>>--pointsDir test/clusters/clusteredPoints --output sensei.txt
>>
>>
>>
>>tested this with the latest revision of mahout-0.6-snapshot
>>
>>
>>When I try to do clustering with my clojure code (same as the one posted 
>>before), it is still giving me the same error, any idea?
>>
>>
>>Regards,
>>Jeffrey04
>>
>>
>>
>>>________________________________
>>>From: Jake Mannix <[email protected]>
>>>To: [email protected]; Jeffrey <[email protected]>
>>>Sent: Friday, August 26, 2011 1:23 AM
>>>Subject: Re: Clustering (fkmeans) with Mahout using Clojure
>>>
>>>
>>>
>>>
>>>
>>>On Thu, Aug 25, 2011 at 10:11 AM, Jeffrey <[email protected]> wrote:
>>>
>>>I am trying to write a short script to cluster my data via clojure (calling 
>>>Mahout classes though). I have my input data in this format (which is an 
>>>output from a 
>>>>
>>>>
>>>
>>>
>>>This line you're instantiating a new SequentialAccessSparseVector, with the 
>>>value of cardinality being "count (vals photo_list)" - you need to have all 
>>>of your Vectors exist with the same cardinality (ie. they live in the same 
>>>vector space, mathematically).  So you need to figure out how big they need 
>>>to be, and instantiate them *all* with this cardinality.
>>> 
>>>                                      (new SequentialAccessSparseVector 
>>>(count (vals photo_list)))
>>>>
>>>
>>>
>>>
>>>
>>>The error you are getting below:
>>>
>>>
>>>EDIT: apparently cardinality needs to be 1, need to figure out how to do it
>>>>
>>>
>>>
>>>is actually telling you that you're trying to say all vectors should be 
>>>cardinality 1, but it found some vectors with cardinality 10, so it threw an 
>>>exception. 
>>> 
>>>  -jake
>>>
>>>
>>
>>
>
>

Reply via email to