I see. Yes, you'll have to perform the transform suggested in the javadoc to get proper p(term | topic) from the output data.
Cheers, Andy On Thu, Dec 20, 2012 at 11:33 AM, Sampath Jayarathna < [email protected]> wrote: > Hi Andy, I'm not deriving anything by the way. I guess it should be > corrected to p (term | topic) from the output. > > On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer < > [email protected]> wrote: > > > Hey Sam, How are you deriving p(word | topic) from the output data? Note > > from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel: > > > > /** > > * Thin wrapper around a {@link Matrix} of counts of occurrences of > (topic, > > term) pairs. Dividing > > * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the > > values for all terms in that > > * row yields p(term | topic). Instead dividing it by all topic columns > > for that term yields > > * p(topic | term). > > * > > * Multithreading is enabled for the {@code update(Matrix)} method: this > > method is async, and > > * merely submits the matrix to a work queue. When all work has been > > submitted, > > * {@code awaitTermination()} should be called, which will block until > > updates have been > > * accumulated. > > */ > > public class TopicModel implements Configurable, Iterable<MatrixSlice> { > > > > Andy > > > > > > > > On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna < > > [email protected] > > > wrote: > > > > > Hi, > > > When I run Mahout LDA using cvb0_local some of the p(word | topic) > > > probability values are coming up >1. > > > I guess this is something to do with the number of decimal digits to > > > display as the output per each probability. > > > Is there a place where we can change this to a precision with Doubles? > or > > > is this some kind of a bug in the LDA output? > > > > > > Thanks > > > > > > Sam > > > > > >
