I see. Yes, you'll have to perform the transform suggested in the javadoc
to get proper p(term | topic) from the output data.

Cheers,
Andy


On Thu, Dec 20, 2012 at 11:33 AM, Sampath Jayarathna <
[email protected]> wrote:

> Hi Andy, I'm not deriving anything by the way. I guess it should be
> corrected to p (term | topic) from the output.
>
> On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer <
> [email protected]> wrote:
>
> > Hey Sam, How are you deriving p(word | topic) from the output data? Note
> > from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel:
> >
> > /**
> >  * Thin wrapper around a {@link Matrix} of counts of occurrences of
> (topic,
> > term) pairs.  Dividing
> >  * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the
> > values for all terms in that
> >  * row yields p(term | topic).  Instead dividing it by all topic columns
> > for that term yields
> >  * p(topic | term).
> >  *
> >  * Multithreading is enabled for the {@code update(Matrix)} method: this
> > method is async, and
> >  * merely submits the matrix to a work queue.  When all work has been
> > submitted,
> >  * {@code awaitTermination()} should be called, which will block until
> > updates have been
> >  * accumulated.
> >  */
> > public class TopicModel implements Configurable, Iterable<MatrixSlice> {
> >
> > Andy
> >
> >
> >
> > On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna <
> > [email protected]
> > > wrote:
> >
> > > Hi,
> > >      When I run Mahout LDA using cvb0_local some of the p(word | topic)
> > > probability values are coming up >1.
> > > I guess this is something to do with the number of decimal digits to
> > > display as the output per each probability.
> > > Is there a place where we can change this to a precision with Doubles?
> or
> > > is this some kind of a bug in the LDA output?
> > >
> > > Thanks
> > >
> > > Sam
> > >
> >
>

Reply via email to