Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-68004911
I did few measurements of the recent performance optimizations of
`ANNLeastSquaresGradient`. I used the cluster of 6 machines (Xeon 3.3GHz 4
cores, 16GB RAM) with 12 workers total and mnist8m dataset and trained
`ANNClassifier` for 40 iterations with no hidden layer, which is 784x10
topology. Error on mnist test was around 9%.
|Before optimization | After optimization
--------|------------ | -------------
Total time | 00:47:55 | 00:16:58
Avg step time| 51 s. | 23 s.
It became ~3x faster. The optimization should be even more evident with
bigger configurations, i.e larger weight matrices.
Update: one can also take a look on the recent comparison with unreleased
LogisticRegression
https://github.com/apache/spark/pull/1379#issuecomment-68002991
@jkbradley If we want to be consistent with the current MLlib API, we need
to implement `ANNwithXXX` pattern as it is done with LogisticRegression.
Though, I don't like it neither. BTW how can I restart the test other than with
a new commit?
@bgreeven I like case classes more than the other options considered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]