Two things,

- use trunk.  We are about to release 0.5 and there has been a ton of
progress since 0.4 including
several important bug fixes.

- LDA isn't really clustering.  It is more along the lines of SVD as a
dimensionality reduction.  It should
be possible to display the internals to find which terms or documents have
the highest components on
a single topic, but combinations of topics are still interesting in LDA just
as combinations of coordinates
in SVD are interesting.

- It would probably be more interesting if you were to cluster the LDA
representation using k-means and
look at those results.

The reason that LDA is grouped together with the clustering algorithms is
that it is unsupervised.  It has
some real differences, however.


On Tue, Apr 26, 2011 at 12:16 PM, Ian Helmke <[email protected]> wrote:

> I'm looking at using LDA to cluster documents based on topics. I've
> gotten LDA to work in Mahout 0.4 and I am able to get keywords and
> topics using the built-in mahout utilities.
>
> Is there any simple way to view which documents are assigned to which
> clusters after performing LDA? This could easily be done using
> canopy/kmeans with the -cl option (if I'm using the command line
> utilities), but I don't see any equivalent anywhere in the LDA
> utilities.
>

Reply via email to