Current Mahout examples cluster Reuters build has same issue: https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console
Kind regards, Stevo Slavic. On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit <[email protected]>wrote: > > Thanks Suneel. > I tried to add this flag (though I think clusteredPoints directory was > supposed to be created by default?). > Either way, for some reason whenever I add '-cl' (tried to run it on > several data sets), I get the following error: > "There is no queue named default" > (even though I do specify a queue by -Dmapred.job.queue.name=...). > I don't get this error otherwise. > > Has anyone ever encountered this error? > Is there some sort of configuration I'm missing? > > Thanks, > > Galit. > > -----Original Message----- > From: Suneel Marthi [mailto:[email protected]] > Sent: Wednesday, July 10, 2013 5:30 PM > To: [email protected] > Subject: Re: mahout kmeans not generating clusteredPoint dir? > > Been a while since I last worked with this, I believe u r missing the > clustering option '-cl'. > Give that a try. > > > > > ________________________________ > From: "Fuhrmann Alpert, Galit" <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Wednesday, July 10, 2013 5:17 AM > Subject: mahout kmeans not generating clusteredPoint dir? > > > Hello, > > I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran > successfully and created a directory containing clusters-*, including the > last which was clusters-3-final. > However, it did not create the clusteredPoints, or at least I cannot find > it under the same dir (or anywhere else). > > My call was: > mahout kmeans -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 > --clusters outputSeeds > > Was there an extra argument I needed to specify in order for it to > generate the clusteredPoints? > (BTW I also can't see the outputSeeds. Was it created for seeds and then > deleted?) > > According to mahout in action: > > The k-means clustering implementation creates two types of directories in > the output > folder. The clusters-* directories are formed at the end of each > iteration: the clusters-0 > directory is generated after the first iteration, clusters-1 after the > second iteration, and > so on. These directories contain information about the clusters: centroid, > standard > deviation, and so on. The clusteredPoints directory, on the other hand, > contains the > final mapping from cluster ID to document ID. This data is generated from > the output > of the last MapReduce operation. > The directory listing of the output folder looks something like this: > $ ls -l reuters-kmeans-clusters > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-1 > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-2 > ... > drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint > > Again, my call did not generate the clusteredPoint directory. > I would appreciate your help. > > Thanks a lot, > > Galit. >
