INdeed.

On Mon, Jun 27, 2011 at 5:27 PM, Hector Yee <[email protected]> wrote:

> So I tried Yahoo LDA  on 52 M documents with 1000 topics.
>
> Yahoo LDA with a dictionary of 100k terms does 1 iteration every 30 minutes
> on a single machine using 4 cores.
>
> Mahout LDA using 20 nodes and a dictionary of 30k takes more than a day for
> an iteration and didn't complete (something about output error during the
> reduce step - this may be a CDHbeta3 issue not sure, since reuters clusters
> fine).
>
> Hopefully the ideas from the Yahoo version can be incorporated into the
> Mahout LDA.
>
> On Fri, Jun 10, 2011 at 6:49 AM, Federico Castanedo <
> [email protected]
> > wrote:
>
> > Hi all,
> >
> > i got through the referenced paper and seems that besides all the
> > distributed tasks the way the inference for \alpha and \beta
> > is performed was the key element on improved the LDA trained performance.
> > They use SGD for the hyperparameter adjustment of \alpha.
> >
> > bests,
> > Federico
> >
> > 2011/6/10 Jake Mannix <[email protected]>
> >
> > > It's all c++, custom distributed processing, custom distributed
> > > coordination
> > > and storage.
> > >
> > > We can certainly try to port over the algorithmic ideas, but the
> > > distributed
> > > systems stuff would be a significant departure from our current setup -
> > > it's
> > > not a web service and it's not hadoop, and it's not a command line
> > utility
> > > -
> > > it's a cluster of long-running processes all intercommunicating.
>  Sounds
> > > awesome, but that's a way's off from where we are now.
> > >
> > >  -jake
> > >
> > > On Thu, Jun 9, 2011 at 7:52 PM, Stanley Xu <[email protected]>
> wrote:
> > >
> > > > Awesome! Guess it would be much faster than then current version in
> > > Mahout.
> > > > Is that possible to just use this version in mahout?
> > > >
> > > > On Fri, Jun 10, 2011 at 8:12 AM, <[email protected]> wrote:
> > > >
> > > > > Yahoo released its hadoop code for LDA
> > > > >
> > > > >
> > > >
> > >
> >
> http://blog.smola.org/post/6359713161/speeding-up-latent-dirichlet-allocation
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>

Reply via email to