Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/7916#discussion_r36149160
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
@@ -214,29 +214,61 @@ class LocalLDAModel private[clustering] (
gammaShape)
}
+ /**
+ * Calculates a lower bound on the log likelihood of the entire corpus
and inferred topics.
+ * Note that this bound sums 2 parts:
+ * - a bound on the log likelihood of the corpus, which scales with
corpus size.
+ * See [[logLikelihood()]].
+ * - a bound on the log likelihood of the estimated topics (topic-term
distributions),
+ * which does not scale with corpus size.
+ * See [[topicsLogLikelihood()]].
+ *
+ * See Equation (16) in original Online LDA paper.
+ *
+ * @param documents test corpus to use for calculating log likelihood
+ * @return variational lower bound on the log likelihood of the entire
corpus and inferred topics
+ */
+ def fullLogLikelihood(documents: RDD[(Long, Vector)]): Double = {
--- End diff --
nit: *joint*LogLikelihood
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]