Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/10306#issuecomment-171834736
  
    Regarding your local performance test:
    
    1. Make sure you installed optimized BLAS on your system and loaded 
correctly in JVM via netlib-java. The different should be significant at 
3000x3000 (with or without multi-treading).
    2. Your test of GEMM and AXPY is not equivalent. First of all, they are not 
using the same matrices for multiplication. Secondly, ` axpy(1.0, bb(j), 
aa(j))` should be ` axpy(1.0, bb(j), aa(i))`. Otherwise, you get some benefit 
from local caching.
    
    Could you re-run the test? I will take a look at your implementation.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to