[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin

jkbradley Tue, 28 Apr 2015 16:01:09 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4419#issuecomment-97253799
  
    Also, there are a few to-do items:
    * unit tests
      * This is the big item.  Do you have an idea of how you plan to test 
this?  Some things, such as getters and setters, will be easy to test.  But the 
algorithm itself may be difficult.  Some possibilities are:
        * Break algorithm into pieces, and test each piece against 
hand-computed values.
        * Test 1 iteration of the algorithm with miniBatchFraction = 1.0 on a 
tiny dataset, and compared against values computed using Blei's code (or some 
other reference implementation).
      * Also, Java tests will be nice to make sure the API works for Java.  
These don't need to do much beyond calling all methods to make sure the method 
calls compile and run in Java.
    * example app: This would be nice to have and hopefully could involve a 
slight modification of the current LDAExample
    * programming guide update: This will be a small update to the LDA section 
in the clustering guide.
    
    The example app and programming guide can be in follow-up PRs, or in this 
one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin

Reply via email to