hi Mahesh, Thanks a lot for the response!
(1) I am using cluster_reuters.sh to run the K-means. I simply started the script and selected k-means as the clustering algorithm. I had my HADOOP_HOME environment variable set up, also my hadoop runs in pseudo-distributed mode. (2) I did have a /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed file generated on HDFS, but it is nearly empty (hundreds of bytes). Compared to the generated files on my local mahout working directories, most (if not all) generated files on HDFS are nearly empty. That is, the output files were generated but there is no content in it (when I use hadoop dfs -text to look into it) I suspect the HDFS writting had some issues. (3) The "cluster-reuters.sh" script can succesfully finish the texts vectorization Map-Reduce jobs. But again, the output seems to be empty. I am trying to run it in pseudo-mode manually to investigate more. I am not sure if the cluster-reuters.sh is intended to run on a pseudo-distributed Hadoop. Thanks a lot and any pointer will be greatly appreciated. Wei From: Mahesh Balija <[email protected]> To: user <[email protected]>, Date: 04/04/2014 02:31 AM Subject: Re: problems with running K-means on hadoop's pseudo-distributed mode Hi Wei Zhang, Can you check whether this path exists in your Hadoop HDFS /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed Instead of using cluster_reuters.sh script file can you run Kmeans manually on your cluster. BTW, what is the command you are using for running cluster_reuters.sh script? Best, Mahesh.B. On Wed, Apr 2, 2014 at 12:24 AM, Wei Zhang <[email protected]> wrote: > > Hello, > > I am new to Mahout. I have installed the Mahout-0.9. > > I have configured a hadoop(1.0.3)) on my laptop (Redhat 6, Lenovo W530). I > am experimenting the k-means test ( by running > mahout-distribution-0.9/examples/bin/cluster-reuters.sh) > > I am able to run the k-means test out of box on hadoop in loacal mode > successfully. > > However, when I run hadoop in pseudo-distributed mode, the k-means test > would fail (after successfully running 9 Map-Reduce jobs) with following > stacktrace: > > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found > in /tmp/mahout-work-weiz/reuters-kmeans-clusters/part-randomSeed. Check > your -c argument. > at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters > (KMeansDriver.java:206) > at org.apache.mahout.clustering.kmeans.KMeansDriver.run > (KMeansDriver.java:140) > at org.apache.mahout.clustering.kmeans.KMeansDriver.run > (KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.mahout.clustering.kmeans.KMeansDriver.main > (KMeansDriver.java:47) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:76) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:607) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke > (ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver > (ProgramDriver.java:139) > at > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:76) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:607) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > I tried to google the reason for this failure, but couldn't get a clear > understanding. I am wondering could you help with some pointers ? > > Thanks! > > Wei >
