Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-69371908
@jkbradley I totally agree that we need an extensible API for ANN, the
problem is that it depends on MLlib API that is not that flexible yet. I wonder
if we can bring Spark & MLlib developers to this discussion... Until then our
only option is to implement workarounds. Some of my concerns:
- Widen `Gradient` interface to pass (input, output) instead of `Vector,
Double`. Currently we stack input and output into Vector and then unstack it in
`Gradient` every time
- Pluggable optimizers and updaters versus `AlgorithmWithXXX` pattern
- Matrix-based batch processing of data
- Asynchronous gradient update
Actually, my current main concern is scalability. If you want to use MLlib
with small datasets such as mnist and shallow networks then you are OK, but you
can do it also with plenty of other machine learning libraries. When it comes
to bigger data, bigger clusters and even deep neural networks then certain
MLlib interfaces become the bottleneck.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]