Mahout LDA is not widely used.  I doubt that this scale is a problem with
LDA in general.

Right now for performance and use in production, I would recommend Yahoo's
LDA.   see
http://blog.smola.org/post/6359713161/speeding-up-latent-dirichlet-allocation

Mahout's SVD codes can handle this size problem easily, but that isn't at
all the same as LDA ... just kind of the same shape.


On Sun, Sep 18, 2011 at 4:53 AM, Timmy Wilson <[email protected]> wrote:

> Hi,
>
> I'm considering using LDA to cluster a large social graph.
>
> Users are documents, relationships are terms --
> http://www.machinedlearnings.com/2011/03/lda-on-social-graph.html
>
> I want to scale to 25M documents @ 200terms/doc (on average).
>
> Is this reasonable, are there examples of LDA usage @ this scale?
>
> Thanks,
> Timmy Wilson
>

Reply via email to