Re: mahout and hadoop configuration question

Paritosh Ranjan Fri, 03 Aug 2012 19:35:57 -0700

-k 10000

Try reducing the number of initial clusters, or, use canopy clustering to find 
out the initial clusters.
Currently, the initial number of clusters is too high, which might be resulting 
into OOM.



On 04-08-2012 02:34, Sears Merritt wrote:

Exactly. There isn't an error. The job just runs on a single machine and 
eventually crashes when it exhausts the JVM's memory. I never see it show up in 
the job tracker and never get any map-reduce status output. The full output is 
here:

-bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o 
/users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and 
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: 
/home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: 
{--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], 
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
 --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], 
--method=[mapreduce], --numClusters=[10000], 
--output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
        at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
        at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
        at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
        at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
        at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
        at 
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

On Aug 3, 2012, at 3:00 PM, Sean Owen <[email protected]> wrote:

I don't see an error here...? the warning is an ignorable message from
hadoop.

On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <[email protected]>wrote:

Hi All,

I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
(Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
mahout job connects to HDFS for reading/writing data but only runs hadoop
on a single machine, not the entire cluster. To the best of my knowledge I
have all the environment variables configured properly, as you will see
from the output below.

When I launch the job using the command line tools as follows:

bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output
-c /users/merritts/clusters -k 100 -x 10

I get the following output:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB:
/home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
{--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
--endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
--method=[mapreduce], --numClusters=[10000],
--output=[/users/merritts/kmeans_output], --startPhase=[0],
--tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor

Has anyone run into this before? If so, how did you fix the issue?

Thanks for your time,
Sears Merritt

Re: mahout and hadoop configuration question

Reply via email to