Re: mahout and hadoop configuration question

Sean Owen Fri, 03 Aug 2012 14:14:16 -0700

Ah that's the ticket. The stack trace shows it is failing in the driver
program, which runs client-side. It's not getting to launch a job.


It looks like it's running out of memory creating a new dense vector in the
random seed generator process. I don't know anything more than that about
why it happens, whether your input is funny, etc. but that is why it is not
getting to Hadoop.

On Fri, Aug 3, 2012 at 5:04 PM, Sears Merritt <[email protected]>wrote:

> Exactly. There isn't an error. The job just runs on a single machine and
> eventually crashes when it exhausts the JVM's memory. I never see it show
> up in the job tracker and never get any map-reduce status output. The full
> output is here:
>
> -bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o
> /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> MAHOUT-JOB:
> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
> --method=[mapreduce], --numClusters=[10000],
> --output=[/users/merritts/kmeans_output], --startPhase=[0],
> --tempDir=[temp]}
> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
>         at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
>         at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
>         at
> org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
>         at
> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
>         at
> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
>         at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
>
>
> On Aug 3, 2012, at 3:00 PM, Sean Owen <[email protected]> wrote:
>
> > I don't see an error here...? the warning is an ignorable message from
> > hadoop.
> >
> > On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <[email protected]
> >wrote:
> >
> >> Hi All,
> >>
> >> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
> >> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
> >> mahout job connects to HDFS for reading/writing data but only runs
> hadoop
> >> on a single machine, not the entire cluster. To the best of my
> knowledge I
> >> have all the environment variables configured properly, as you will see
> >> from the output below.
> >>
> >> When I launch the job using the command line tools as follows:
> >>
> >> bin/mahout kmeans -i /users/merritts/rvs -o
> /users/merritts/kmeans_output
> >> -c /users/merritts/clusters -k 100 -x 10
> >>
> >> I get the following output:
> >>
> >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> >> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> >> MAHOUT-JOB:
> >>
> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> >> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
> >> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
> >>
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> >> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
> >> --method=[mapreduce], --numClusters=[10000],
> >> --output=[/users/merritts/kmeans_output], --startPhase=[0],
> >> --tempDir=[temp]}
> >> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting
> /users/merritts/clusters
> >> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load
> native-hadoop
> >> library for your platform... using builtin-java classes where applicable
> >> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> >> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> >>
> >> Has anyone run into this before? If so, how did you fix the issue?
> >>
> >> Thanks for your time,
> >> Sears Merritt
>
>

Re: mahout and hadoop configuration question

Reply via email to