MiA NewsKMeansClustering Example Help

Chris Harrington Wed, 30 Jan 2013 04:10:55 -0800

Hi all, 

I'm new to Mahout and I've been going through the MiA book, lately I've been 
trying Chapter 10's example of NewsKMeansClustering as it looks like a good 
starting point for my own stuff but I've run into a problem just trying to run 
and view the output.


I'm trying to view the output of running the java file via the cluster dump 
utility but all I get out of it is an empty text file.

I'm using MiA-mahout-0.6 and mahout-distribution-0.6. This is the process I 
went trough to get to this point.
Get the reuters data and put it into seqfiles.  (I issue these commands to 
bin/mahout in the mahout-distribution-0.6 project)
mvn -e -q exec:java 
-Dexec.mainClass="org.apache.lucene.benchmark.utils.ExtractReuters" 
-Dexec.args="reuters/ reuters-extracted/"
bin/mahout seqdirectory -c UTF-8 -i examples/reuters-extracted/ -o 
reuters-seqfiles
I (manually - drag and drop) move the  seq files to MiA (0.6) project into the 
folder reuters-seqfiles.
I then run MiA example of NewsKMeansClustering from chapter 10 which results in 
a folder newsClusters being created and populated with various files (clusters 
folder, dictionary.file-0, centroids folder, etc)
There doesn't appear to be any unusual errors in the console
2013-01-30 11:15:42.593 java[11011:1903] Unable to load realm info from 
SCDynamicStore
SLF4J: The requested version 1.5.11 by your slf4j binding is not compatible 
with [1.6]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
2013-01-30 11:15:45 JobClient [WARN] Use GenericOptionsParser for parsing the 
arguments. Applications should implement Tool for the same.
. (same as above line)
.
.
2013-01-30 11:16:55 NativeCodeLoader [WARN] Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2013-01-30 11:16:56 JobClient [WARN] Use GenericOptionsParser for parsing the 
arguments. Applications should implement Tool for the same.
.(same as above line)
.
.
I then run the cluster dump command to create an output.txt file.
../mahout-distribution-0.6/bin/mahout clusterdump -s 
newsClusters/clusters/clusters-19/ -o output.txt -d 
newsClusters/dictionary.file-0 -dt sequencefile -n 10
but all this does is create an empty text file.

Any help would be much appreciated.

Thanks,
Chris

MiA NewsKMeansClustering Example Help

Reply via email to