Hey Sam, How are you deriving p(word | topic) from the output data? Note
from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel:

/**
 * Thin wrapper around a {@link Matrix} of counts of occurrences of (topic,
term) pairs.  Dividing
 * {@code topicTermCount.viewRow(topic).get(term)} by the sum over the
values for all terms in that
 * row yields p(term | topic).  Instead dividing it by all topic columns
for that term yields
 * p(topic | term).
 *
 * Multithreading is enabled for the {@code update(Matrix)} method: this
method is async, and
 * merely submits the matrix to a work queue.  When all work has been
submitted,
 * {@code awaitTermination()} should be called, which will block until
updates have been
 * accumulated.
 */
public class TopicModel implements Configurable, Iterable<MatrixSlice> {

Andy



On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna <[email protected]
> wrote:

> Hi,
>      When I run Mahout LDA using cvb0_local some of the p(word | topic)
> probability values are coming up >1.
> I guess this is something to do with the number of decimal digits to
> display as the output per each probability.
> Is there a place where we can change this to a precision with Doubles? or
> is this some kind of a bug in the LDA output?
>
> Thanks
>
> Sam
>

Reply via email to