Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19337#discussion_r143818776
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -224,6 +224,24 @@ private[clustering] trait LDAParams extends Params 
with HasFeaturesCol with HasM
       /**
        * For Online optimizer only: [[optimizer]] = "online".
        *
    +   * A (positive) learning parameter that controls the convergence of 
variational inference.
    +   * Smaller value will lead to more accuracy model and longer training 
time.
    --- End diff --
    
    Sorry to be troublesome here. I'm not sure smaller value for the 
convergence tolerance will always lead to a more accurate model. Smaller value 
will make the statistics for each batch more fitting to the training data (even 
over fitting), I'm not sure about the overall model accuracy will be better, 
especially for data not in the training dataset. 
    
    Maybe just "Smaller value will lead to a more converged model and longer 
training time"?
    
    Please also add doc for default value.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to