Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4419#issuecomment-97253799
Also, there are a few to-do items:
* unit tests
* This is the big item. Do you have an idea of how you plan to test
this? Some things, such as getters and setters, will be easy to test. But the
algorithm itself may be difficult. Some possibilities are:
* Break algorithm into pieces, and test each piece against
hand-computed values.
* Test 1 iteration of the algorithm with miniBatchFraction = 1.0 on a
tiny dataset, and compared against values computed using Blei's code (or some
other reference implementation).
* Also, Java tests will be nice to make sure the API works for Java.
These don't need to do much beyond calling all methods to make sure the method
calls compile and run in Java.
* example app: This would be nice to have and hopefully could involve a
slight modification of the current LDAExample
* programming guide update: This will be a small update to the LDA section
in the clustering guide.
The example app and programming guide can be in follow-up PRs, or in this
one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]