Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4419#issuecomment-94041947
  
    Responding to comments in [https://github.com/apache/spark/pull/4807]:
    
    {quote}
    Another question is about existing parameters in LDA:
    Except K, all other parameters (Alpha, Beta, Maxiteration, seed, 
checkPointInterval) are useless or have different default values for Online 
LDA. I'm not sure if we should move all those parameters to EM optimizer.
    {quote}
    --> I disagree.  OnlineLDA could take most of these parameters, with 
caveats:
    * alpha, beta: These are hyperparameters of LDA.  EM does not estimate 
these, but it could be modified to estimate them.  The Online LDA algorithm you 
are following estimates these.  I'd recommend:
      * LDA takes these parameters as fixed values.
      * Online LDA takes a special parameter ```estimateAlphaBeta: Boolean``` 
which indicates whether or not it should estimate these hyperparameters.  In 
the implementation, it should be easy to update or not update these values.
    * maxIteration
      * As I suggested before, I'd recommend that OnlineLDA take 
```numIterations``` and ```miniBatchFraction``` instead of ```batchNumber``` 
(to mimic GradientDescent).  ```numIterations``` will be shared by all LDA 
algorithms, but ```miniBatchFraction``` will be specific to OnlineLDA.
    * seed: OnlineLDA uses randomness in sampling and should use a random seed.
    
    I agree that ```checkpointInterval``` is not applicable to Online LDA.
    
    {quote}
    Actually I find LDA and OnlineLDA share quite few things and it's kind of 
difficult to merge them together. Maybe for OnlineLDA, separating it to another 
File is a better choice . (Later I'll provide an interface / example for 
stream).
    {quote}
    I agree that having the interface and the different algorithms in separate 
files is probably best.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to