Re: problems with running K-means on hadoop's pseudo-distributed mode

Wei Zhang Fri, 04 Apr 2014 07:33:43 -0700

hi Mahesh,

Thanks a lot for the response!

(1) I am using cluster_reuters.sh to run the K-means. I simply started the
script and selected k-means as the clustering algorithm. I had my
HADOOP_HOME environment variable set up, also my hadoop runs in
pseudo-distributed mode.

(2) I did have
a /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed file
generated on HDFS, but it is nearly empty (hundreds of bytes).
Compared to the generated files on my local mahout working directories,
most (if not all) generated files on HDFS are nearly empty. That is, the
output files were generated but there is no content in it (when I use
hadoop dfs -text to look into it)
I suspect the HDFS writting had some issues.

(3) The "cluster-reuters.sh" script can succesfully finish the texts
vectorization Map-Reduce jobs. But again, the output seems to be empty.
I am trying to run it in pseudo-mode manually to investigate more. I am not
sure if the cluster-reuters.sh is intended to run on a pseudo-distributed
Hadoop.

Thanks a lot and any pointer will be greatly appreciated.

Wei

From:   Mahesh Balija <[email protected]>
To:     user <[email protected]>,
Date:   04/04/2014 02:31 AM
Subject:        Re: problems with running K-means on hadoop's
            pseudo-distributed mode

Hi Wei Zhang,

Can you check whether this path exists in your Hadoop HDFS
 /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed

Instead of using cluster_reuters.sh script file can you run Kmeans manually
on your cluster.

BTW, what is the command you are using for running cluster_reuters.sh
script?

Best,
Mahesh.B.

On Wed, Apr 2, 2014 at 12:24 AM, Wei Zhang <[email protected]> wrote:

>
> Hello,
>
> I am new to Mahout. I have installed the Mahout-0.9.
>
> I have configured a hadoop(1.0.3)) on my laptop (Redhat 6, Lenovo W530).
I
> am experimenting the k-means test ( by running
> mahout-distribution-0.9/examples/bin/cluster-reuters.sh)
>
> I am able to run the   k-means test out of box on hadoop in loacal mode
> successfully.
>
> However, when I run hadoop in pseudo-distributed mode, the k-means test
> would fail (after successfully running 9 Map-Reduce jobs) with following
> stacktrace:
>
> Exception in thread "main" java.lang.IllegalStateException: No input
> clusters found
> in /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed. Check
> your -c argument.
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters
> (KMeansDriver.java:206)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.run
> (KMeansDriver.java:140)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.run
> (KMeansDriver.java:103)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.main
> (KMeansDriver.java:47)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:76)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:607)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke
> (ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver
> (ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:76)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:607)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> I tried to google the reason for this failure, but couldn't get a clear
> understanding. I am wondering could you help with some pointers ?
>
> Thanks!
>
> Wei
>

Re: problems with running K-means on hadoop's pseudo-distributed mode

Reply via email to