Re: Interpreting Kmeans output

Jeff Eastman Fri, 01 Oct 2010 14:20:23 -0700

I think you pointed it at the wrong type of directory; the -p argumentexpects the clusteredPoints directory which contains sequence files withkey=clusterId and value=WeightedVectorWritable. I don't know where yourclusters directory came from (its not a standard directory name) but itseems to contain Text. When you ran kmeans, did you specify theclusteringOption (-cl)?


On 10/1/10 1:34 PM, Matt Tanquary wrote:

I added the pointsDir (-p) option and pointed to the clusters folder
that I specified in kmeans and now got the following error:


  mahout clusterdump -s kmeans/output/clusters-1 -p kmeans/clusters
Running on hadoop, using HADOOP_HOME=/usr/local/install/tools/hadoop
HADOOP_CONF_DIR=/usr/local/install/tools/hadoop/conf
10/10/01 10:29:56 INFO common.AbstractJob: Command line arguments:
{--dictionaryType=text, --endPhase=2147483647,
--pointsDir=kmeans/clusters, --seqFileDir=kmeans/output/clusters-1,
--startPhase=0, --tempDir=temp}
10/10/01 10:29:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
10/10/01 10:29:57 INFO zlib.ZlibFactory: Successfully loaded&
initialized native-zlib library
10/10/01 10:29:57 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.ClassCastException: class
org.apache.hadoop.io.Text
         at java.lang.Class.asSubclass(Class.java:3018)
         at 
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:277)
         at 
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:218)
         at 
org.apache.mahout.utils.clustering.ClusterDumper.run(ClusterDumper.java:142)
         at 
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:103)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


The clusters folder I pointed to has the following contents:

part-randomSeed

Thanks,
M@

On Fri, Oct 1, 2010 at 8:37 AM, Matt Tanquary<[email protected]>  wrote:

I was able to create clusters using mahout kmeans. Now, I use
clusterdump to get output and I see the basic results I expect:

mahout clusterdump -s kmeans/output/clusters-1

CL-1{n=3 c=[19.667, 21.000] r=[1.700, 0.816]}
CL-4{n=2 c=[2.000, 2.500] r=[1.000, 0.500]}

Which seems to tell me that there are 2 clusters, with the 1st having
3 records and the 2nd having 2 records.

How do I determine which records fell in CL-1 and which fell in CL-4 ?

Thanks,
-M@

Re: Interpreting Kmeans output

Reply via email to