Hi All,
I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
(Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the mahout
job connects to HDFS for reading/writing data but only runs hadoop on a single
machine, not the entire cluster. To the best of my knowledge I have all the
environment variables configured properly, as you will see from the output
below.
When I launch the job using the command line tools as follows:
bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c
/users/merritts/clusters -k 100 -x 10
I get the following output:
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB:
/home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
{--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
--endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
--method=[mapreduce], --numClusters=[10000],
--output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
Has anyone run into this before? If so, how did you fix the issue?
Thanks for your time,
Sears Merritt