Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-69371908
  
    @jkbradley I totally agree that we need an extensible API for ANN, the 
problem is that it depends on MLlib API that is not that flexible yet. I wonder 
if we can bring Spark & MLlib developers to this discussion... Until then our 
only option is to implement workarounds. Some of my concerns:
       - Widen `Gradient` interface to pass (input, output) instead of `Vector, 
Double`. Currently we stack input and output into Vector and then unstack it in 
`Gradient` every time
       - Pluggable optimizers and updaters versus `AlgorithmWithXXX` pattern
       - Matrix-based batch processing of data
       - Asynchronous gradient update 
    
    Actually, my current main concern is scalability. If you want to use MLlib 
with small datasets such as mnist and shallow networks then you are OK, but you 
can do it also with plenty of other machine learning libraries. When it comes 
to bigger data, bigger clusters and even deep neural networks then certain 
MLlib interfaces become  the bottleneck.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to