Hi Andy, I'm not deriving anything by the way. I guess it should be
corrected to p (term | topic) from the output.

On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer <
[email protected]> wrote:

> Hey Sam, How are you deriving p(word | topic) from the output data? Note
> from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel:
>
> /**
>  * Thin wrapper around a {@link Matrix} of counts of occurrences of (topic,
> term) pairs.  Dividing
>  * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the
> values for all terms in that
>  * row yields p(term | topic).  Instead dividing it by all topic columns
> for that term yields
>  * p(topic | term).
>  *
>  * Multithreading is enabled for the {@code update(Matrix)} method: this
> method is async, and
>  * merely submits the matrix to a work queue.  When all work has been
> submitted,
>  * {@code awaitTermination()} should be called, which will block until
> updates have been
>  * accumulated.
>  */
> public class TopicModel implements Configurable, Iterable<MatrixSlice> {
>
> Andy
>
>
>
> On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna <
> [email protected]
> > wrote:
>
> > Hi,
> >      When I run Mahout LDA using cvb0_local some of the p(word | topic)
> > probability values are coming up >1.
> > I guess this is something to do with the number of decimal digits to
> > display as the output per each probability.
> > Is there a place where we can change this to a precision with Doubles? or
> > is this some kind of a bug in the LDA output?
> >
> > Thanks
> >
> > Sam
> >
>

Reply via email to