Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-69236610
@akopich I had hoped to get this into MLlib, but after more consideration,
I believe it is too complex. Currently, what would be ideal is a simple
implementation of LDA since that is all that most users need. While
generalizations like robust PLSA may outperform LDA with proper tuning, itâs
somewhat of a research area, and it may be better to go with LDA since it has
been very widely tested and used.
However, I am sure some users would want to use your implementation of
Robust PLSA, so it would be valuable for you to make it available as a package
for Spark.
The best path right now, I believe, will be to create a simple PR with a
minimal public API, where that API should be extensible with (a) extra
parameters/features and (b) alternate optimization/learning algorithms. I've
posted a public design doc on the LDA JIRA
[here](https://issues.apache.org/jira/browse/SPARK-1405), and Iâm going to
submit such a PR. I would of course appreciate your feedback on it. Thanks
very much for your understanding.
When we merge the initial LDA PR, @mengxr will be sure to include all of
those who have participated as authors of Spark LDA PRs: @akopich @witgo
@yinxusen @dlwh @EntilZha @jegonzal
CC: @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]