GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/9513

    [WIP] [SPARK-5565] [ML] LDA wrapper for Pipelines API

    This adds LDA to spark.ml, the Pipelines API.  It follows the design doc in 
the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major 
change:
    * I eliminated doc IDs.  These are not necessary with DataFrames since the 
user can add an ID column as needed.
    
    **WIP**: The wrapper is done, but I need to add unit tests.  Submitting as 
WIP to get review started early.
    
    Note: This will conflict with [https://github.com/apache/spark/pull/9484], 
but I'll try to merge [https://github.com/apache/spark/pull/9484] first and 
then rebase this PR.
    
    CC: @hhbyyh @feynmanliang  If you have a chance to make a pass, that'd be 
really helpful--thanks!  Now that I'm done traveling & this PR is almost ready, 
I'll see about reviewing other PRs critical for 1.6.
    
    CC: @mengxr 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark lda-pipelines

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9513
    
----
commit c053d0af1015d64391636be889f387455d73c38c
Author: Joseph K. Bradley <[email protected]>
Date:   2015-11-05T19:24:15Z

    partly done adding LDA

commit 23d40c4298fd5c63b4555ae404958c30e8b0c842
Author: Joseph K. Bradley <[email protected]>
Date:   2015-11-06T01:09:40Z

    done adding LDA.  need to add tests

commit 583e173741d4cd32c40f92904e60432d15119e5e
Author: Joseph K. Bradley <[email protected]>
Date:   2015-11-06T01:25:41Z

    fix indentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to