GitHub user avulanov opened a pull request:
https://github.com/apache/spark/pull/7621
[SPARK-2352] [ML] Add Artificial Neural Network (ANN) to Spark
### Summary
This pull request contains the following features for ML:
- Multilayer Perceptron regressor
- Multilayer Perceptron classifier
This implementation is based on our initial pull request with @bgreeven:
https://github.com/apache/spark/pull/1290 and inspired by very insightful
suggestions from @mengxr and @witgo (I would like to thank all other people
from the mentioned thread for useful discussions). The original code was
extensively tested and benchmarked. Since then, I've addressed two main
requirements that prevented the code from merging into the main branch:
- Extensible interface, so it will be easy to implement new types of
networks
- Main building blocks are traits `Layer` and `LayerModel`. They are
used for constructing layers of ANN. New layers can be added by extending the
`Layer` and `LayerModel` traits. These traits are private in this release in
order to save path to improve them based on community feedback
- Back propagation is implemented in general form, so there is no need
to change it (optimization algorithm) when new layers are implemented
- Speed and scalability: this implementation has to be comparable in
terms of speed to the state of the art single node implementations.
- The developed benchmark for large ANN shows that the proposed code
is on par with C++ CPU implementation and scales nicely with the number of
workers. Details can be found here: https://github.com/avulanov/ann-benchmark
### Other implementations based on the proposed interface
- DBN and RBM by @witgo
https://github.com/witgo/spark/tree/ann-interface-gemm-dbn
- Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm
@mengxr and @dbtsai kindly agreed to perform code review.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/avulanov/spark SPARK-2352-ann
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7621.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7621
----
commit a2261330c227be8ef26172dbe355a617d653553a
Author: Alexander Ulanov <[email protected]>
Date: 2015-07-23T14:55:15Z
Multilayer Perceptron regressor and classifier
ANN test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]