mahout and hadoop configuration question

Sears Merritt Fri, 03 Aug 2012 13:57:19 -0700

Hi All,

I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster 
(Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the mahout 
job connects to HDFS for reading/writing data but only runs hadoop on a single 
machine, not the entire cluster. To the best of my knowledge I have all the 
environment variables configured properly, as you will see from the output 
below.


When I launch the job using the command line tools as follows:

bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c 
/users/merritts/clusters -k 100 -x 10

I get the following output:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and 
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: 
/home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: 
{--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], 
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
 --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], 
--method=[mapreduce], --numClusters=[10000], 
--output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor

Has anyone run into this before? If so, how did you fix the issue?

Thanks for your time,
Sears Merritt

mahout and hadoop configuration question

Reply via email to