yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for 
level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-588608946
 
 
   Hi @srowen and @mengxr, 
   I tested vector size 256  and nativeBLAS shows obvious perf gain compared 
with f2jBLAS in methods axpy(~1.7X), dot(~2.8X) and scal(double, dense)(>1.5X). 
 For MKL, I can confirm it uses AVX in the methods from output but for 
OpenBLAS, seems it haven't used AVX in level-1 routines as based on the info 
from [https://github.com/xianyi/OpenBLAS/blob/develop/README.md](url).
   As to the MKL_NUM_THREADS and OPENBLAS_NUM_THREADS, limiting the threads to 
1 doesn't always mean the best end-to-end performance. We used to test Kmeans 
using HiBench in a 1+4 cluster with MKL configured. Set the threads number to 1 
or use default setting have no obvious performance change. By the way, using 
default MKL threads setting, MKL will bring 1.09X perf gain than java 
implementation in end-to-end.
   I also have updated this PR to revert back to use java implementation for 
scal(sparse) and dspmv(). Please take a further review.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to