zhengruifeng edited a comment on issue #28229: URL: https://github.com/apache/spark/pull/28229#issuecomment-616937541
> Did you benchmark with native BLAS with a machine with AVX2 or AVX512 ? The native optimization not only take advantage of multi-thread but also SIMD, cache etc. I tested with OpenBLAS (OPENBLAS_NUM_THREADS=1) with a i7-8850 cpu, which support avx2, not avx512; > I do think it's a good idea! But it's still not a general speedup for all cases, gain on assuming some specific conditions. Still need to use the general K-Means. When `k` and `numFeatures` are small, there is no much optimization space for triangle-inequality. But I guess this also applies to high-level BLAS, suppose `k=2` or `k=64`, I guess `BLAS.gemm` with `k=2` may not gain as much speedup as `k=64`? > it's not unusual in other parts of MLlib such as in BLAS, switch sparse/dense into different cases? There are some algorithms (in `ml.stat`) that can switch between sparse/dense, but no classification/regression/clustering impls support it now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
