Hi Grant, I am running the NewsKMeansClustering Class from NetBeans (Run -> Run File). I did not change anything in the class code except the name of the input directory, so the class can see the dataset that I want to cluster. So, I changed the statement:
String inputDir = "inputDir"; to: String inputDir = "reuters-seqfiles"; The directory (reuters-seqfiles) contains the dataset in SequenceFile format. This directory and its data are achieved by running the seqdirectory program using the mahout launcher (bin/mahout seqdirectory). Do you want me to post for you the code of the NewsKMeansClustering Class from the book, or you already have it ? Thanks, Ahmad On Thu, Nov 17, 2011 at 4:57 PM, Grant Ingersoll <[email protected]>wrote: > What command did you run? > > On Nov 16, 2011, at 4:47 AM, Ahmad Ammari wrote: > > > Hello, > > > > I am practicing the mahout examples in the clustering part of the book > > "Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying > > to run the class NewsKMeansClustering, which I got its source code from > the > > companion source code files. What I understood is that the input > directory > > "inputDir" should contain the input documents in SequenceFile format. > > Therefore, I tried to make the "reuters-seqfiles" directory that we > > generated using the seqdirectory program that runs in the mahout launcher > > in chapter 8 (page 139). I then ran the NewsKMeansClustering, which > started > > to run fine, until I get a java.lang.IllegalStateException exception, > > saying that No clusters found, as follows: > > > > java.lang.IllegalStateException: No clusters found. Check your -c path. > > at > > > org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient > monitorAndPrintJob > > INFO: map 0% reduce 0% > > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient > monitorAndPrintJob > > INFO: Job complete: job_local_0010 > > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log > > INFO: Counters: 0 > > Exception in thread "main" java.lang.InterruptedException: K-Means > > Iteration failed processing reutersClusters/canopy-centroids/clusters-0 > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363) > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310) > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237) > > at > > > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152) > > at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81) > > ------------------------------------------------------------------------ > > BUILD FAILURE > > ------------------------------------------------------------------------ > > Total time: 15.391s > > Finished at: Wed Nov 16 00:49:14 GMT 2011 > > Final Memory: 10M/150M > > ------------------------------------------------------------------------ > > Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec > > (default-cli) on project mahout-examples: Command execution failed. > Process > > exited with an error: 1(Exit value: 1) -> [Help 1] > > > > To see the full stack trace of the errors, re-run Maven with the -e > switch. > > Re-run Maven using the -X switch to enable full debug logging. > > > > For more information about the errors and possible solutions, please read > > the following articles: > > [Help 1] > > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > > > > What does it mean that no cluster found?! Is the input directory wrong? > If > > so, what input should I give the class? I tried to change the canopy > > thresholds (250, 120) to some other numbers, tried also changing the > > EuclideanDistanceMeasure for the canopy clustering to > > CosineDistanceMeasure, with no use. > > > > Many thanks in advance, > > Ahmad > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > >
