GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/9513
[WIP] [SPARK-5565] [ML] LDA wrapper for Pipelines API
This adds LDA to spark.ml, the Pipelines API. It follows the design doc in
the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major
change:
* I eliminated doc IDs. These are not necessary with DataFrames since the
user can add an ID column as needed.
**WIP**: The wrapper is done, but I need to add unit tests. Submitting as
WIP to get review started early.
Note: This will conflict with [https://github.com/apache/spark/pull/9484],
but I'll try to merge [https://github.com/apache/spark/pull/9484] first and
then rebase this PR.
CC: @hhbyyh @feynmanliang If you have a chance to make a pass, that'd be
really helpful--thanks! Now that I'm done traveling & this PR is almost ready,
I'll see about reviewing other PRs critical for 1.6.
CC: @mengxr
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark lda-pipelines
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9513.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9513
----
commit c053d0af1015d64391636be889f387455d73c38c
Author: Joseph K. Bradley <[email protected]>
Date: 2015-11-05T19:24:15Z
partly done adding LDA
commit 23d40c4298fd5c63b4555ae404958c30e8b0c842
Author: Joseph K. Bradley <[email protected]>
Date: 2015-11-06T01:09:40Z
done adding LDA. need to add tests
commit 583e173741d4cd32c40f92904e60432d15119e5e
Author: Joseph K. Bradley <[email protected]>
Date: 2015-11-06T01:25:41Z
fix indentation
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]