Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-54901132
  
    @bgreeven Thanks for contributing ANN! I made a quick scan over the code. 
Some high-level comments:
    
    1. The user guide is for normal users and it should focus on how to use 
ANN. If we want to leave some notes for developers, we can append a section at 
the end.
    2. We don't ask users to treat unit tests as demos or examples. Instead, we 
put a short code snippet in the user guide and put a complete example under 
`examples/`.
    3. `GeneralizedModel` and `GeneralizedAlgorithm` are definitely out of the 
scope of this PR and they should not live under `mllib.ann`. We can make a 
separate JIRA to discuss the APIs. Could you remove them in this PR?
    4. `predict` outputs the prediction for the first node. Would the first 
node be the only special node? How about having `predict(v)` output the full 
prediction and `predict(v, i)` output the prediction to the i-th node?
    5. Could you try to use LBFGS instead of GradientDescent?
    6. Please replace for loops by while loops. The latter is faster in Scala.
    7. Please follow the [Spark Code Style 
Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)
 and update the style, e.g.,
      a. remove space after `(` and before `)`
      b. add ScalaDoc for all public classes and methods
      c. line width should be smaller than 100 chars (in both main and test)
      d. some verification code are left as comments, please find a way to move 
them to unit tests
      e. organize imports into groups and order them alphabetically within each 
group
      f. do not add `return` or `;` unless they are necessary
    8. Please use existing unit tests as templates. For example, please rename 
`TestParallelANN` to `ParallelANNSuite` and use `LocalSparkContext` and 
`FunSuite` for Spark setup and test assertions. Remove `main()` from unit tests.
    9. Is `Parallel` necessary in the name `ParallelANN`?
    10. What methods and classes are public? Could they be private or package 
private? Please generate the API doc and check the public ones.
    
    Given the size of the PR, I hope we can split it into multiple PRs. My 
recommendation would be
    
    1) Remove user guide, `GeneralizedAlgorithm`, and `GeneralizedModel` and 
focus on the implementation and unit tests.
    2) Add user guide and example code.
    3) Discuss interface for generalized algorithm and model.
    
    I suggest removing the user guide because we may need to go through 
multiple iterations for the code review of the implementation. It is not 
necessary to update the user guide with each update.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to