Thanks for your response. I'm still confused as I'm trying to run this on real data rather than the reuters example:
If I run kmeans on my data: mahout kmeans -k 5 -i inputSeq.dat -o outputPath --maxIter 2 --clusters outputSeeds It creates a directory containing clusters-*, including the clusters-*-final. But it does not create the clusteredPoints directory. If I run the kmeans with the flag "-cl": mahout kmeans -k 5 -i inputSeq.dat -o outputPath --maxIter 2 -Dmapred.job.queue.name=... --clusters outputSeeds -cl I get the error: "There is no queue named default" (even though I do specify a queue by -Dmapred.job.queue.name=...). (I do not get this error "There is no queue named default" if I don't add the -cl to my call. It runs just fine, but not creating the clusteredPoints directory though). For the sake of it, I tried to add "//" to the input path, but this didn't seem to matter. (which error did it fix for you?) Did you/anyone manage to overcome this? Does kmeans "-cl" expect somehow an explicit declaration of the queue by another parameter? is the "queue named default" hard coded anywhere? Thanks, Galit. -----Original Message----- From: Taner Diler [mailto:[email protected]] Sent: Monday, July 29, 2013 3:44 PM To: [email protected] Subject: Re: mahout kmeans not generating clusteredPoint dir? After converting reuters sgm files to txt formant in (reuters-extracted), on the first mahout command seqdirectory, you should give input path as file:///your_dir/reuters-extracted. If you give input parameter as /your_dir/reuters-extracted, I got same problem on k-means clustering. On Mon, Jul 29, 2013 at 9:49 AM, Fuhrmann Alpert, Galit <[email protected]>wrote: > > Thanks. Was there any fix to this? Or is this an open issues? > > -----Original Message----- > From: Stevo Slavić [mailto:[email protected]] > Sent: Saturday, July 27, 2013 1:27 AM > To: [email protected] > Cc: Suneel Marthi > Subject: Re: mahout kmeans not generating clusteredPoint dir? > > Current Mahout examples cluster Reuters build has same issue: > > https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout > -Examples-Cluster-Reuters/395/console > > Kind regards, > Stevo Slavic. > > > On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit > <[email protected]>wrote: > > > > > Thanks Suneel. > > I tried to add this flag (though I think clusteredPoints directory > > was supposed to be created by default?). > > Either way, for some reason whenever I add '-cl' (tried to run it on > > several data sets), I get the following error: > > "There is no queue named default" > > (even though I do specify a queue by -Dmapred.job.queue.name=...). > > I don't get this error otherwise. > > > > Has anyone ever encountered this error? > > Is there some sort of configuration I'm missing? > > > > Thanks, > > > > Galit. > > > > -----Original Message----- > > From: Suneel Marthi [mailto:[email protected]] > > Sent: Wednesday, July 10, 2013 5:30 PM > > To: [email protected] > > Subject: Re: mahout kmeans not generating clusteredPoint dir? > > > > Been a while since I last worked with this, I believe u r missing > > the clustering option '-cl'. > > Give that a try. > > > > > > > > > > ________________________________ > > From: "Fuhrmann Alpert, Galit" <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Wednesday, July 10, 2013 5:17 AM > > Subject: mahout kmeans not generating clusteredPoint dir? > > > > > > Hello, > > > > I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran > > successfully and created a directory containing clusters-*, > > including the last which was clusters-3-final. > > However, it did not create the clusteredPoints, or at least I cannot > > find it under the same dir (or anywhere else). > > > > My call was: > > mahout kmeans -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 > > --clusters outputSeeds > > > > Was there an extra argument I needed to specify in order for it to > > generate the clusteredPoints? > > (BTW I also can't see the outputSeeds. Was it created for seeds and > > then > > deleted?) > > > > According to mahout in action: > > > > The k-means clustering implementation creates two types of > > directories in the output folder. The clusters-* directories are > > formed at the end of each > > iteration: the clusters-0 > > directory is generated after the first iteration, clusters-1 after > > the second iteration, and so on. These directories contain > > information about the clusters: centroid, standard deviation, and so > > on. The clusteredPoints directory, on the other hand, contains the > > final mapping from cluster ID to document ID. This data is generated > > from the output of the last MapReduce operation. > > The directory listing of the output folder looks something like this: > > $ ls -l reuters-kmeans-clusters > > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user > > 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 > > 18:56 > > clusters-2 ... > > drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint > > > > Again, my call did not generate the clusteredPoint directory. > > I would appreciate your help. > > > > Thanks a lot, > > > > Galit. > > >
