Github user akopich commented on a diff in the pull request:
https://github.com/apache/spark/pull/18924#discussion_r143084875
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -503,21 +533,22 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
}
/**
- * Update alpha based on `gammat`, the inferred topic distributions for
documents in the
- * current mini-batch. Uses Newton-Rhapson method.
+ * Update alpha based on `logphat`.
+ * Uses Newton-Rhapson method.
* @see Section 3.3, Huang: Maximum Likelihood Estimation of Dirichlet
Distribution Parameters
* (http://jonathan-huang.org/research/dirichlet/dirichlet.pdf)
+ * @param logphat Expectation of estimated log-posterior distribution of
+ * topics in a document averaged over the batch.
+ * @param nonEmptyDocsN number of non-empty documents
*/
- private def updateAlpha(gammat: BDM[Double]): Unit = {
+ private def updateAlpha(logphat: BDV[Double], nonEmptyDocsN : Double):
Unit = {
--- End diff --
The methods will have to cast `nonEmptyDocsN: Int` to `Double`. This way we
have the conversion implicitly, but the method is private so I don't think it's
going to hurt.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]