Hi Andy, I'm not deriving anything by the way. I guess it should be corrected to p (term | topic) from the output.
On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer < [email protected]> wrote: > Hey Sam, How are you deriving p(word | topic) from the output data? Note > from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel: > > /** > * Thin wrapper around a {@link Matrix} of counts of occurrences of (topic, > term) pairs. Dividing > * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the > values for all terms in that > * row yields p(term | topic). Instead dividing it by all topic columns > for that term yields > * p(topic | term). > * > * Multithreading is enabled for the {@code update(Matrix)} method: this > method is async, and > * merely submits the matrix to a work queue. When all work has been > submitted, > * {@code awaitTermination()} should be called, which will block until > updates have been > * accumulated. > */ > public class TopicModel implements Configurable, Iterable<MatrixSlice> { > > Andy > > > > On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna < > [email protected] > > wrote: > > > Hi, > > When I run Mahout LDA using cvb0_local some of the p(word | topic) > > probability values are coming up >1. > > I guess this is something to do with the number of decimal digits to > > display as the output per each probability. > > Is there a place where we can change this to a precision with Doubles? or > > is this some kind of a bug in the LDA output? > > > > Thanks > > > > Sam > > >
