Hey Sam, How are you deriving p(word | topic) from the output data? Note
from the javadoc of org.apache.mahout.clustering.lda.cvb.TopicModel:
/**
* Thin wrapper around a {@link Matrix} of counts of occurrences of (topic,
term) pairs. Dividing
* {@code topicTermCount.viewRow(topic).get(term)} by the sum over the
values for all terms in that
* row yields p(term | topic). Instead dividing it by all topic columns
for that term yields
* p(topic | term).
*
* Multithreading is enabled for the {@code update(Matrix)} method: this
method is async, and
* merely submits the matrix to a work queue. When all work has been
submitted,
* {@code awaitTermination()} should be called, which will block until
updates have been
* accumulated.
*/
public class TopicModel implements Configurable, Iterable<MatrixSlice> {
Andy
On Thu, Dec 20, 2012 at 8:11 AM, Sampath Jayarathna <[email protected]
> wrote:
> Hi,
> When I run Mahout LDA using cvb0_local some of the p(word | topic)
> probability values are coming up >1.
> I guess this is something to do with the number of decimal digits to
> display as the output per each probability.
> Is there a place where we can change this to a precision with Doubles? or
> is this some kind of a bug in the LDA output?
>
> Thanks
>
> Sam
>