Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-69237765
  
    @bgreeven  I’m not too surprised that the majority vote (a.k.a. one vs. 
all) did not do very well; it does not scale well with the number of classes.  
A tree (or better yet, error-corrected output codes) generally work better, in 
my experience.
    
    @avulanov  True, we try for consistency with APIs, except where we’re 
changing the norm.  There is not a clear write-up about the “norm,” 
although the new spark.ml package and its design doc (in the JIRA) give an 
overview of some parts.  Basically, we’re aiming to make things more 
pluggable and extensible, while minimizing API change.  If that requires 
short-term API changes (such as switching away from ANNWithX method names), 
that can be acceptable.
    
    @bgreeven @avulanov The test results look pretty good, though I’m not 
sure what to expect for accuracy.  I think the main item remaining is figuring 
out the public API.  It’s tough since neural networks / deep learning are a 
rapidly evolving field, and there are a lot of model & algorithm variants out 
there.  Ideally, we could put together a design doc (to be linked from the 
JIRA) for this big feature which would:
    
    * Design a public API for neural networks and deep learning
     * Comparison of other major libraries’ APIs
     * Minimum viable product API for an initial PR
     * Path for the future:
        * What extensions might we need to do, and can we keep the public API 
stable for these?
        * What extensions might users want to do?  Is the API easily extensible 
and/or pluggable, or can we make it so in the future without changing the 
existing public API?
    * Briefly discuss the algorithm
     * Alg sketch, limitations, etc.
     * Alternative algorithms, and a path for making the optimization algorithm 
pluggable in the future (as we’ve discussed a bit in the PR conversation)
    
    I realize it takes quite a while to get a big new feature ready.  If 
you’d like to encourage early adoption, you could also post this for now as a 
package for Spark, while the PR is made fully ready.
    
    CC: @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to