Hi All,

I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster 
(Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the mahout 
job connects to HDFS for reading/writing data but only runs hadoop on a single 
machine, not the entire cluster. To the best of my knowledge I have all the 
environment variables configured properly, as you will see from the output 
below.

When I launch the job using the command line tools as follows:

bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c 
/users/merritts/clusters -k 100 -x 10

I get the following output:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and 
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: 
/home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: 
{--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], 
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
 --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], 
--method=[mapreduce], --numClusters=[10000], 
--output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor

Has anyone run into this before? If so, how did you fix the issue?

Thanks for your time,
Sears Merritt

Reply via email to