GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/5661

    [Spark-7090][MLlib] Introduce LDAOptimizer to LDA to further improve 
extensibility

    jira: https://issues.apache.org/jira/browse/SPARK-7090 
    
    LDA was implemented with extensibility in mind. And with the development of 
OnlineLDA and Gibbs Sampling, we are collecting more detailed requirements from 
different algorithms.
    As Joseph Bradley proposed in https://github.com/apache/spark/pull/4807 and 
with some further discussion, we'd like to adjust the code structure a little 
to present the common interface and extension point clearly.
    Basically class LDA would be a common entrance for LDA computing. And each 
LDA object will refer to a LDAOptimizer for the concrete algorithm 
implementation. Users can customize LDAOptimizer with specific parameters and 
assign it to LDA.
    
    
    Concrete changes:
    
    1. Add a trait `LDAOptimizer`, which defines the common iterface for 
concrete implementations. Each subClass is a wrapper for a specific LDA 
algorithm. 
    
    2. Move EMOptimizer to file LDAOptimizer and inherits from LDAOptimizer, 
rename to EMLDAOptimizer. (in case a more generic EMOptimizer comes in the 
future)
            -adjust the constructor of EMOptimizer, since all the parameters 
should be passed in through initialState method. This can avoid unwanted 
confusion or overwrite.
            -move the code from LDA.initalState to initalState of EMLDAOptimizer
    
    3. Add property ldaOptimizer to LDA and its getter/setter, and 
EMLDAOptimizer is the default Optimizer.
    
    4. Change the return type of LDA.run from DistributedLDAModel to LDAModel.
    
    Further work:
    add OnlineLDAOptimizer and other possible Optimizers once ready.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark ldaRefactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5661
    
----
commit ec2f857645bdcabc8f51c310237d0365e7d2230e
Author: Yuhao Yang <[email protected]>
Date:   2015-04-22T12:49:37Z

    protoptype for discussion

commit 0bb8400e70011c8f97ece31d395a8c75b15bab4f
Author: Yuhao Yang <[email protected]>
Date:   2015-04-23T11:15:04Z

    refactor LDA with Optimizer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to