[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

avulanov Tue, 23 Dec 2014 14:19:38 -0800

Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-68004911
  
    I did few measurements of the recent performance optimizations of 
`ANNLeastSquaresGradient`. I used the cluster of 6 machines  (Xeon 3.3GHz 4 
cores, 16GB RAM) with 12 workers total and mnist8m dataset and trained 
`ANNClassifier` for 40 iterations with no hidden layer, which is 784x10 
topology. Error on mnist test was around 9%.
    
     |Before optimization | After optimization
    --------|------------ | -------------
    Total time | 00:47:55 | 00:16:58
    Avg step time| 51 s. | 23 s.
    
    It became ~3x faster. The optimization should be even more evident with 
bigger configurations, i.e larger weight matrices.
    
    Update: one can also take a look on the recent comparison with unreleased 
LogisticRegression 
https://github.com/apache/spark/pull/1379#issuecomment-68002991
    
    @jkbradley If we want to be consistent with the current MLlib API, we need 
to implement `ANNwithXXX` pattern as it is done with LogisticRegression. 
Though, I don't like it neither. BTW how can I restart the test other than with 
a new commit?
    @bgreeven I like case classes more than the other options considered.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

Reply via email to