After converting reuters sgm files to txt formant in (reuters-extracted),
on the first mahout command seqdirectory, you should give input path as
file:///your_dir/reuters-extracted. If you give input parameter as
/your_dir/reuters-extracted, I got same problem on k-means clustering.


On Mon, Jul 29, 2013 at 9:49 AM, Fuhrmann Alpert, Galit <[email protected]>wrote:

>
> Thanks. Was there any fix to this? Or is this an open issues?
>
> -----Original Message-----
> From: Stevo Slavić [mailto:[email protected]]
> Sent: Saturday, July 27, 2013 1:27 AM
> To: [email protected]
> Cc: Suneel Marthi
> Subject: Re: mahout kmeans not generating clusteredPoint dir?
>
> Current Mahout examples cluster Reuters build has same issue:
>
> https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console
>
> Kind regards,
> Stevo Slavic.
>
>
> On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit
> <[email protected]>wrote:
>
> >
> > Thanks Suneel.
> > I tried to add this flag (though I think clusteredPoints directory was
> > supposed to be created by default?).
> > Either way, for some reason whenever I add '-cl' (tried to run it on
> > several data sets), I get the following error:
> > "There is no queue named default"
> > (even though I do specify a queue by -Dmapred.job.queue.name=...).
> > I don't get this error otherwise.
> >
> > Has anyone ever encountered this error?
> > Is there some sort of configuration I'm missing?
> >
> > Thanks,
> >
> > Galit.
> >
> > -----Original Message-----
> > From: Suneel Marthi [mailto:[email protected]]
> > Sent: Wednesday, July 10, 2013 5:30 PM
> > To: [email protected]
> > Subject: Re: mahout kmeans not generating clusteredPoint dir?
> >
> > Been a while since I last worked with this, I believe u r missing the
> > clustering option '-cl'.
> > Give that a try.
> >
> >
> >
> >
> > ________________________________
> >  From: "Fuhrmann Alpert, Galit" <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Sent: Wednesday, July 10, 2013 5:17 AM
> > Subject: mahout kmeans not generating clusteredPoint dir?
> >
> >
> > Hello,
> >
> > I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran
> > successfully and created a directory containing clusters-*, including
> > the last which was clusters-3-final.
> > However, it did not create the clusteredPoints, or at least I cannot
> > find it under the same dir (or anywhere else).
> >
> > My call was:
> > mahout kmeans  -k 4000 -i inputSeq.dat -o outputPath --maxIter 3
> > --clusters outputSeeds
> >
> > Was there an extra argument I needed to specify in order for it to
> > generate the clusteredPoints?
> > (BTW I also can't see the outputSeeds. Was it created for seeds and
> > then
> > deleted?)
> >
> > According to mahout in action:
> >
> > The k-means clustering implementation creates two types of directories
> > in the output folder. The clusters-* directories are formed at the end
> > of each
> > iteration: the clusters-0
> > directory is generated after the first iteration, clusters-1 after the
> > second iteration, and so on. These directories contain information
> > about the clusters: centroid, standard deviation, and so on. The
> > clusteredPoints directory, on the other hand, contains the final
> > mapping from cluster ID to document ID. This data is generated from
> > the output of the last MapReduce operation.
> > The directory listing of the output folder looks something like this:
> > $ ls -l reuters-kmeans-clusters
> > drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user
> > 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 18:56
> > clusters-2 ...
> > drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint
> >
> > Again, my call did not generate the clusteredPoint directory.
> > I would appreciate your help.
> >
> > Thanks a lot,
> >
> > Galit.
> >
>

Reply via email to