Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5946#issuecomment-118315297
Yes I confirm, that benchmarks are same (posted below). Also in the last
commit I removed some overhead for ndim=2 numpy arrays.
Dot operations:
Vector size 500000, values = 50000, iterations = 100
In this branch: 0.0006000828742980957
In master: 0.06196121454238892
Vector size 50000, values = 5000, iterations = 100
In this branch: 5.4757595062255856e-05
In master:0.005893096923828125
Vector size 50000, values=500, iterations = 100
In this branch: 2.2442340850830077e-05
In master: 0.0006871128082275391
Squared distance calcuation:
Vector size 500000, values = 50000, iterations = 100
In this branch: 0.0045609426498413085
In master: 2.4458935689926147
Vector size 50000, values = 5000, iterations = 100
In this branch: 0.0005040478706359864
In master: 0.2419515371322632
Vector size 50000, values=500, iterations = 100
In this branch: 0.0007156062126159667
In master: 0.24092751741409302
I can get almost a 100x speedup for the dot and 500x speedup for the
squared distances.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]