Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4807#issuecomment-89374472
  
    Here's a proposal.  Let me know what you think!
    
    @hhbyyh 
    
    > 1. Should different algorithms have different entrance in LDA, like 
runGibbs, runOnline, runEM? I kinda like it as the separation looks simple and 
clear.
    
    Multiple run methods do make that separation clearer, but they also force 
beginner users (who don't know what these algorithms are) to choose an 
algorithm before they can try LDA.  I'd prefer to keep a single run() method 
and specify the algorithm as a String parameter.
    
    One con of a single run() method is that users will get back an LDAModel 
which they will need to cast to a more specific type (if they want to use the 
specialization's extra functionality).  I think we could eliminate this issue 
later on by opening up each algorithm as its own Estimator (so that LDA would 
become a meta-Estimator, if you will).
    
    > 2. Online LDA have several specific arguments. What's the recommended 
place to put them and their getter/setter, in LDA or optimizer ?
    
    That is an issue, for sure.  I'd propose:
    ```
    trait Optimizer // no public API
    class EMOptimizer extends Optimizer {
      // public API: getters/setters for EM-specific parameters
      // private[mllib] API: methods for learning
    }
    class LDA {
      def setOptimizer(optimizer: String) // takes "EM" / "Gibbs" / "online"
      def setOptimizer(optimizer: Optimizer) // takes Optimizer instance which 
user can configure beforehand
      def getOptimizer: Optimizer
    }
    ```
    For users, Optimizer classes simply store algorithm-specific parameters.  
Users can use the default Optimizer, or they can specify the optimizer via 
String (with default algorithm parameters) or via Optimizer (with configured 
algorithm parameters).
    
    @EntilZha It might be easiest to revert to master (to make diffs easier), 
but you can decide.  That would be great if you have time to work on it in the 
next couple of days, thanks.  I'll be out of town (but online) Wednesday 
unfortunately, but I hope it goes well!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to