RE: mahout kmeans not generating clusteredPoint dir?

Fuhrmann Alpert, Galit Wed, 31 Jul 2013 02:04:09 -0700

Thanks for your response.
I'm still confused as I'm trying to run this on real data rather than the 
reuters example:


If I run kmeans on my data: mahout kmeans  -k 5 -i inputSeq.dat -o outputPath 
--maxIter 2  --clusters outputSeeds
It creates a directory containing clusters-*, including the clusters-*-final.
But it does not create the clusteredPoints directory.

If I run the kmeans with the flag "-cl": 
mahout kmeans  -k 5 -i inputSeq.dat -o outputPath --maxIter 2 
-Dmapred.job.queue.name=...  --clusters outputSeeds -cl
I get the error:
"There is no queue named default" (even though I do specify a queue by 
-Dmapred.job.queue.name=...).
(I do not get this error "There is no queue named default" if I don't add the 
-cl to my call. It runs just fine, but not creating the clusteredPoints 
directory though).

For the sake of it, I tried to add "//" to the input path, but this didn't seem 
to matter.
(which error did it fix for you?)

Did you/anyone manage to overcome this?

Does kmeans "-cl" expect somehow an explicit declaration of the queue by 
another parameter? is the "queue named default" hard coded anywhere?

Thanks,

Galit.

-----Original Message-----
From: Taner Diler [mailto:[email protected]] 
Sent: Monday, July 29, 2013 3:44 PM
To: [email protected]
Subject: Re: mahout kmeans not generating clusteredPoint dir?

After converting reuters sgm files to txt formant in (reuters-extracted), on 
the first mahout command seqdirectory, you should give input path as 
file:///your_dir/reuters-extracted. If you give input parameter as 
/your_dir/reuters-extracted, I got same problem on k-means clustering.


On Mon, Jul 29, 2013 at 9:49 AM, Fuhrmann Alpert, Galit <[email protected]>wrote:

>
> Thanks. Was there any fix to this? Or is this an open issues?
>
> -----Original Message-----
> From: Stevo Slavić [mailto:[email protected]]
> Sent: Saturday, July 27, 2013 1:27 AM
> To: [email protected]
> Cc: Suneel Marthi
> Subject: Re: mahout kmeans not generating clusteredPoint dir?
>
> Current Mahout examples cluster Reuters build has same issue:
>
> https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout
> -Examples-Cluster-Reuters/395/console
>
> Kind regards,
> Stevo Slavic.
>
>
> On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit
> <[email protected]>wrote:
>
> >
> > Thanks Suneel.
> > I tried to add this flag (though I think clusteredPoints directory 
> > was supposed to be created by default?).
> > Either way, for some reason whenever I add '-cl' (tried to run it on 
> > several data sets), I get the following error:
> > "There is no queue named default"
> > (even though I do specify a queue by -Dmapred.job.queue.name=...).
> > I don't get this error otherwise.
> >
> > Has anyone ever encountered this error?
> > Is there some sort of configuration I'm missing?
> >
> > Thanks,
> >
> > Galit.
> >
> > -----Original Message-----
> > From: Suneel Marthi [mailto:[email protected]]
> > Sent: Wednesday, July 10, 2013 5:30 PM
> > To: [email protected]
> > Subject: Re: mahout kmeans not generating clusteredPoint dir?
> >
> > Been a while since I last worked with this, I believe u r missing 
> > the clustering option '-cl'.
> > Give that a try.
> >
> >
> >
> >
> > ________________________________
> >  From: "Fuhrmann Alpert, Galit" <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Sent: Wednesday, July 10, 2013 5:17 AM
> > Subject: mahout kmeans not generating clusteredPoint dir?
> >
> >
> > Hello,
> >
> > I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran 
> > successfully and created a directory containing clusters-*, 
> > including the last which was clusters-3-final.
> > However, it did not create the clusteredPoints, or at least I cannot 
> > find it under the same dir (or anywhere else).
> >
> > My call was:
> > mahout kmeans  -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 
> > --clusters outputSeeds
> >
> > Was there an extra argument I needed to specify in order for it to 
> > generate the clusteredPoints?
> > (BTW I also can't see the outputSeeds. Was it created for seeds and 
> > then
> > deleted?)
> >
> > According to mahout in action:
> >
> > The k-means clustering implementation creates two types of 
> > directories in the output folder. The clusters-* directories are 
> > formed at the end of each
> > iteration: the clusters-0
> > directory is generated after the first iteration, clusters-1 after 
> > the second iteration, and so on. These directories contain 
> > information about the clusters: centroid, standard deviation, and so 
> > on. The clusteredPoints directory, on the other hand, contains the 
> > final mapping from cluster ID to document ID. This data is generated 
> > from the output of the last MapReduce operation.
> > The directory listing of the output folder looks something like this:
> > $ ls -l reuters-kmeans-clusters
> > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user
> > 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 
> > 18:56
> > clusters-2 ...
> > drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint
> >
> > Again, my call did not generate the clusteredPoint directory.
> > I would appreciate your help.
> >
> > Thanks a lot,
> >
> > Galit.
> >
>

RE: mahout kmeans not generating clusteredPoint dir?

Reply via email to