I think you pointed it at the wrong type of directory; the -p argument
expects the clusteredPoints directory which contains sequence files with
key=clusterId and value=WeightedVectorWritable. I don't know where your
clusters directory came from (its not a standard directory name) but it
seems to contain Text. When you ran kmeans, did you specify the
clusteringOption (-cl)?
On 10/1/10 1:34 PM, Matt Tanquary wrote:
I added the pointsDir (-p) option and pointed to the clusters folder
that I specified in kmeans and now got the following error:
mahout clusterdump -s kmeans/output/clusters-1 -p kmeans/clusters
Running on hadoop, using HADOOP_HOME=/usr/local/install/tools/hadoop
HADOOP_CONF_DIR=/usr/local/install/tools/hadoop/conf
10/10/01 10:29:56 INFO common.AbstractJob: Command line arguments:
{--dictionaryType=text, --endPhase=2147483647,
--pointsDir=kmeans/clusters, --seqFileDir=kmeans/output/clusters-1,
--startPhase=0, --tempDir=temp}
10/10/01 10:29:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
10/10/01 10:29:57 INFO zlib.ZlibFactory: Successfully loaded&
initialized native-zlib library
10/10/01 10:29:57 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.ClassCastException: class
org.apache.hadoop.io.Text
at java.lang.Class.asSubclass(Class.java:3018)
at
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:277)
at
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:218)
at
org.apache.mahout.utils.clustering.ClusterDumper.run(ClusterDumper.java:142)
at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
The clusters folder I pointed to has the following contents:
part-randomSeed
Thanks,
M@
On Fri, Oct 1, 2010 at 8:37 AM, Matt Tanquary<[email protected]> wrote:
I was able to create clusters using mahout kmeans. Now, I use
clusterdump to get output and I see the basic results I expect:
mahout clusterdump -s kmeans/output/clusters-1
CL-1{n=3 c=[19.667, 21.000] r=[1.700, 0.816]}
CL-4{n=2 c=[2.000, 2.500] r=[1.000, 0.500]}
Which seems to tell me that there are 2 clusters, with the 1st having
3 records and the 2nd having 2 records.
How do I determine which records fell in CL-1 and which fell in CL-4 ?
Thanks,
-M@