[GitHub] spark pull request: [SPARK-7401] [MLlib] [PySpark] Vectorize dot p...

MechCoder Fri, 03 Jul 2015 03:56:48 -0700

Github user MechCoder commented on the pull request:

    https://github.com/apache/spark/pull/5946#issuecomment-118315297
  
    Yes I confirm, that benchmarks are same (posted below). Also in the last 
commit I removed some overhead for ndim=2 numpy arrays.
    
    Dot operations:
    Vector size 500000, values = 50000, iterations = 100
    In this branch: 0.0006000828742980957
    In master: 0.06196121454238892
    
    Vector size 50000, values = 5000, iterations = 100
    In this branch: 5.4757595062255856e-05
    In master:0.005893096923828125
    
    Vector size 50000, values=500, iterations = 100
    In this branch: 2.2442340850830077e-05
    In master: 0.0006871128082275391
    
    Squared distance calcuation:
    Vector size 500000, values = 50000, iterations = 100
    In this branch: 0.0045609426498413085
    In master: 2.4458935689926147
    
    Vector size 50000, values = 5000, iterations = 100
    In this branch: 0.0005040478706359864
    In master: 0.2419515371322632
    
    Vector size 50000, values=500, iterations = 100
    In this branch: 0.0007156062126159667
    In master: 0.24092751741409302
    
    I can get almost a 100x speedup for the dot and 500x speedup for the 
squared distances.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7401] [MLlib] [PySpark] Vectorize dot p...

Reply via email to