I used the CVB variant of LDA, and when I tried to run LDAPrintTopics I
noticed the topicModel output datatypes changed from the original LDA
implementation.  So I figured I'd just write my own for CVB, and base it
off the LDA implementation.

And I noticed something odd.  When running LDAPrintTopics , it gathers the
top N terms by topic (topWordsForTopics), and normalizes the values in the
vector, which makes sense.  But during the normalization calculation it
also weights the vector by using Math.exp(score) instead of just the
straight score for all calculations.

I get that using Math.exp(score) will give exponentially larger values a
stronger weighting than smaller values, but why is this done in the
normalization?

And if I was going to use the topicModel output as the input to some other
algorithm, would want to run the topicModel vectors through the same kind
of weighting normalization?  And if so, why not just persist the topicModel
in this weighted normalized format in the first place?

And finally, should I also use this same weighting normalization on
the docTopics output as well?  The docTopics are normalized (well, they all
add up to 1), but are the normalized in the same manner?

I'm just trying to figure out how to use the LDA output, and figure out if
there are any steps I need to consider before I use it as input to
something else.

-- 

Thanks,
John C

Reply via email to