Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19337#discussion_r143818776
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -224,6 +224,24 @@ private[clustering] trait LDAParams extends Params
with HasFeaturesCol with HasM
/**
* For Online optimizer only: [[optimizer]] = "online".
*
+ * A (positive) learning parameter that controls the convergence of
variational inference.
+ * Smaller value will lead to more accuracy model and longer training
time.
--- End diff --
Sorry to be troublesome here. I'm not sure smaller value for the
convergence tolerance will always lead to a more accurate model. Smaller value
will make the statistics for each batch more fitting to the training data (even
over fitting), I'm not sure about the overall model accuracy will be better,
especially for data not in the training dataset.
Maybe just "Smaller value will lead to a more converged model and longer
training time"?
Please also add doc for default value.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]