Do we need to do such a transformation? I thought what LDA cvb0_local is
giving at the output is the topic probability distribution per document and
term probability distribution per topic?

if both of these outputs are probabilities, I'm not sure why there are some
values >1. I thought its something to do with decimal digits precision?

Thanks

Sam

On Thu, Dec 20, 2012 at 1:48 PM, Andy Schlaikjer <
[email protected]> wrote:

> I see. Yes, you'll have to perform the transform suggested in the javadoc
> to get proper p(term | topic) from the output data.
>
> Cheers,
> Andy
>
>
> On Thu, Dec 20, 2012 at 11:33 AM, Sampath Jayarathna <
> [email protected]> wrote:
>
> > Hi Andy, I'm not deriving anything by the way. I guess it should be
> > corrected to p (term | topic) from the output.
> >
> > On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer <
> > [email protected]> wrote:
> >
> > > Hey Sam, How are you deriving p(word | topic) from the output data?
> Note
> > > from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel:
> > >
> > > /**
> > >  * Thin wrapper around a {@link Matrix} of counts of occurrences of
> > (topic,
> > > term) pairs.  Dividing
> > >  * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the
> > > values for all terms in that
> > >  * row yields p(term | topic).  Instead dividing it by all topic
> columns
> > > for that term yields
> > >  * p(topic | term).
> > >  *
> > >  * Multithreading is enabled for the {@code update(Matrix)} method:
> this
> > > method is async, and
> > >  * merely submits the matrix to a work queue.  When all work has been
> > > submitted,
> > >  * {@code awaitTermination()} should be called, which will block until
> > > updates have been
> > >  * accumulated.
> > >  */
> > > public class TopicModel implements Configurable, Iterable<MatrixSlice>
> {
> > >
> > > Andy
> > >
> > >
> > >
> > > On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna <
> > > [email protected]
> > > > wrote:
> > >
> > > > Hi,
> > > >      When I run Mahout LDA using cvb0_local some of the p(word |
> topic)
> > > > probability values are coming up >1.
> > > > I guess this is something to do with the number of decimal digits to
> > > > display as the output per each probability.
> > > > Is there a place where we can change this to a precision with
> Doubles?
> > or
> > > > is this some kind of a bug in the LDA output?
> > > >
> > > > Thanks
> > > >
> > > > Sam
> > > >
> > >
> >
>

Reply via email to