zhengruifeng edited a comment on issue #28229:
URL: https://github.com/apache/spark/pull/28229#issuecomment-616937541


   > Did you benchmark with native BLAS with a machine with AVX2 or AVX512 ? 
The native optimization not only take advantage of multi-thread but also SIMD, 
cache etc.
   
   I tested with OpenBLAS (OPENBLAS_NUM_THREADS=1) with a i7-8850 cpu, which 
support avx2, not avx512;
   
   > I do think it's a good idea! But it's still not a general speedup for all 
cases, gain on assuming some specific conditions. Still need to use the general 
K-Means.
   
   When `k` and `numFeatures` are small, there is no much optimization space 
for triangle-inequality. But I guess this also applies to high-level BLAS, 
suppose `k=2` or `k=64`, I guess `BLAS.gemm` with `k=2` may not gain as much 
speedup as `k=64`?
   
   > it's not unusual in other parts of MLlib such as in BLAS, switch 
sparse/dense into different cases?
   
   There are some algorithms (in `ml.stat`) that can switch between 
sparse/dense, but no classification/regression/clustering impls support it now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to