Do we need to do such a transformation? I thought what LDA cvb0_local is giving at the output is the topic probability distribution per document and term probability distribution per topic?
if both of these outputs are probabilities, I'm not sure why there are some values >1. I thought its something to do with decimal digits precision? Thanks Sam On Thu, Dec 20, 2012 at 1:48 PM, Andy Schlaikjer < [email protected]> wrote: > I see. Yes, you'll have to perform the transform suggested in the javadoc > to get proper p(term | topic) from the output data. > > Cheers, > Andy > > > On Thu, Dec 20, 2012 at 11:33 AM, Sampath Jayarathna < > [email protected]> wrote: > > > Hi Andy, I'm not deriving anything by the way. I guess it should be > > corrected to p (term | topic) from the output. > > > > On Thu, Dec 20, 2012 at 1:13 PM, Andy Schlaikjer < > > [email protected]> wrote: > > > > > Hey Sam, How are you deriving p(word | topic) from the output data? > Note > > > from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel: > > > > > > /** > > > * Thin wrapper around a {@link Matrix} of counts of occurrences of > > (topic, > > > term) pairs. Dividing > > > * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the > > > values for all terms in that > > > * row yields p(term | topic). Instead dividing it by all topic > columns > > > for that term yields > > > * p(topic | term). > > > * > > > * Multithreading is enabled for the {@code update(Matrix)} method: > this > > > method is async, and > > > * merely submits the matrix to a work queue. When all work has been > > > submitted, > > > * {@code awaitTermination()} should be called, which will block until > > > updates have been > > > * accumulated. > > > */ > > > public class TopicModel implements Configurable, Iterable<MatrixSlice> > { > > > > > > Andy > > > > > > > > > > > > On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna < > > > [email protected] > > > > wrote: > > > > > > > Hi, > > > > When I run Mahout LDA using cvb0_local some of the p(word | > topic) > > > > probability values are coming up >1. > > > > I guess this is something to do with the number of decimal digits to > > > > display as the output per each probability. > > > > Is there a place where we can change this to a precision with > Doubles? > > or > > > > is this some kind of a bug in the LDA output? > > > > > > > > Thanks > > > > > > > > Sam > > > > > > > > > >
