yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines URL: https://github.com/apache/spark/pull/27546#issuecomment-588608946 Hi @srowen and @mengxr, I tested vector size 256 and nativeBLAS shows obvious perf gain compared with f2jBLAS in methods axpy(~1.7X), dot(~2.8X) and scal(double, dense)(>1.5X). For MKL, I can confirm it uses AVX in the methods from output but for OpenBLAS, seems it haven't used AVX in level-1 routines as based on the info from [https://github.com/xianyi/OpenBLAS/blob/develop/README.md](url). As to the MKL_NUM_THREADS and OPENBLAS_NUM_THREADS, limiting the threads to 1 doesn't always mean the best end-to-end performance. We used to test Kmeans using HiBench in a 1+4 cluster with MKL configured. Set the threads number to 1 or use default setting have no obvious performance change. By the way, using default MKL threads setting, MKL will bring 1.09X perf gain than java implementation in end-to-end. I also have updated this PR to revert back to use java implementation for scal(sparse) and dspmv(). Please take a further review.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
