After converting reuters sgm files to txt formant in (reuters-extracted), on the first mahout command seqdirectory, you should give input path as file:///your_dir/reuters-extracted. If you give input parameter as /your_dir/reuters-extracted, I got same problem on k-means clustering.
On Mon, Jul 29, 2013 at 9:49 AM, Fuhrmann Alpert, Galit <[email protected]>wrote: > > Thanks. Was there any fix to this? Or is this an open issues? > > -----Original Message----- > From: Stevo Slavić [mailto:[email protected]] > Sent: Saturday, July 27, 2013 1:27 AM > To: [email protected] > Cc: Suneel Marthi > Subject: Re: mahout kmeans not generating clusteredPoint dir? > > Current Mahout examples cluster Reuters build has same issue: > > https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console > > Kind regards, > Stevo Slavic. > > > On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit > <[email protected]>wrote: > > > > > Thanks Suneel. > > I tried to add this flag (though I think clusteredPoints directory was > > supposed to be created by default?). > > Either way, for some reason whenever I add '-cl' (tried to run it on > > several data sets), I get the following error: > > "There is no queue named default" > > (even though I do specify a queue by -Dmapred.job.queue.name=...). > > I don't get this error otherwise. > > > > Has anyone ever encountered this error? > > Is there some sort of configuration I'm missing? > > > > Thanks, > > > > Galit. > > > > -----Original Message----- > > From: Suneel Marthi [mailto:[email protected]] > > Sent: Wednesday, July 10, 2013 5:30 PM > > To: [email protected] > > Subject: Re: mahout kmeans not generating clusteredPoint dir? > > > > Been a while since I last worked with this, I believe u r missing the > > clustering option '-cl'. > > Give that a try. > > > > > > > > > > ________________________________ > > From: "Fuhrmann Alpert, Galit" <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Wednesday, July 10, 2013 5:17 AM > > Subject: mahout kmeans not generating clusteredPoint dir? > > > > > > Hello, > > > > I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran > > successfully and created a directory containing clusters-*, including > > the last which was clusters-3-final. > > However, it did not create the clusteredPoints, or at least I cannot > > find it under the same dir (or anywhere else). > > > > My call was: > > mahout kmeans -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 > > --clusters outputSeeds > > > > Was there an extra argument I needed to specify in order for it to > > generate the clusteredPoints? > > (BTW I also can't see the outputSeeds. Was it created for seeds and > > then > > deleted?) > > > > According to mahout in action: > > > > The k-means clustering implementation creates two types of directories > > in the output folder. The clusters-* directories are formed at the end > > of each > > iteration: the clusters-0 > > directory is generated after the first iteration, clusters-1 after the > > second iteration, and so on. These directories contain information > > about the clusters: centroid, standard deviation, and so on. The > > clusteredPoints directory, on the other hand, contains the final > > mapping from cluster ID to document ID. This data is generated from > > the output of the last MapReduce operation. > > The directory listing of the output folder looks something like this: > > $ ls -l reuters-kmeans-clusters > > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user > > 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 18:56 > > clusters-2 ... > > drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint > > > > Again, my call did not generate the clusteredPoint directory. > > I would appreciate your help. > > > > Thanks a lot, > > > > Galit. > > >
