So I tried Yahoo LDA  on 52 M documents with 1000 topics.

Yahoo LDA with a dictionary of 100k terms does 1 iteration every 30 minutes
on a single machine using 4 cores.

Mahout LDA using 20 nodes and a dictionary of 30k takes more than a day for
an iteration and didn't complete (something about output error during the
reduce step - this may be a CDHbeta3 issue not sure, since reuters clusters
fine).

Hopefully the ideas from the Yahoo version can be incorporated into the
Mahout LDA.

On Fri, Jun 10, 2011 at 6:49 AM, Federico Castanedo <[email protected]
> wrote:

> Hi all,
>
> i got through the referenced paper and seems that besides all the
> distributed tasks the way the inference for \alpha and \beta
> is performed was the key element on improved the LDA trained performance.
> They use SGD for the hyperparameter adjustment of \alpha.
>
> bests,
> Federico
>
> 2011/6/10 Jake Mannix <[email protected]>
>
> > It's all c++, custom distributed processing, custom distributed
> > coordination
> > and storage.
> >
> > We can certainly try to port over the algorithmic ideas, but the
> > distributed
> > systems stuff would be a significant departure from our current setup -
> > it's
> > not a web service and it's not hadoop, and it's not a command line
> utility
> > -
> > it's a cluster of long-running processes all intercommunicating.  Sounds
> > awesome, but that's a way's off from where we are now.
> >
> >  -jake
> >
> > On Thu, Jun 9, 2011 at 7:52 PM, Stanley Xu <[email protected]> wrote:
> >
> > > Awesome! Guess it would be much faster than then current version in
> > Mahout.
> > > Is that possible to just use this version in mahout?
> > >
> > > On Fri, Jun 10, 2011 at 8:12 AM, <[email protected]> wrote:
> > >
> > > > Yahoo released its hadoop code for LDA
> > > >
> > > >
> > >
> >
> http://blog.smola.org/post/6359713161/speeding-up-latent-dirichlet-allocation
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>



-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Reply via email to