> I'll second Ted's suggestion of Yahoo's LDA,
> everyone I know who's tried it has been super-impressed
> with its performance.

Cool -- thanks guys!!





On Sun, Sep 18, 2011 at 6:06 PM, Jake Mannix <[email protected]> wrote:
> Timmy,
>
>  Mahout's current LDA implementation would probably have problems with the
> vocabulary size of this data set (it requires the full model [of size:
> numTopics *
> vocabSize] to live in memory in all mappers), but I've got another variation
> of
> this codebase which scales a lot better on my GitHub Mahout
> <https://github.com/jakemannix/Mahout>fork, branch
> name is "cvb0".  But it hasn't been integrated with Mahout trunk on account
> of needing quite a bit more documentation and cleanup.
>
>  Hopefully I can get some time to clean that code up and get it into Mahout
> trunk, as I have seen it pull off a 10-16x speedup over the current impl
> (and
> isn't memory limited in any sense at all, although it *is* a bit heavy on
> disk
> usage: c.f. http://twitter.com/#!/lintool/status/104271708420190208 ).
>
>  I'll second Ted's suggestion of Yahoo's LDA, everyone I know who's tried
> it has been super-impressed with its performance.
>
>  -jake
>
> On Sun, Sep 18, 2011 at 3:53 AM, Timmy Wilson <[email protected]> wrote:
>
>> Hi,
>>
>> I'm considering using LDA to cluster a large social graph.
>>
>> Users are documents, relationships are terms --
>> http://www.machinedlearnings.com/2011/03/lda-on-social-graph.html
>>
>> I want to scale to 25M documents @ 200terms/doc (on average).
>>
>> Is this reasonable, are there examples of LDA usage @ this scale?
>>
>> Thanks,
>> Timmy Wilson
>>
>

Reply via email to