zhengruifeng edited a comment on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-732111713


   It looks like that:
   1, `GEMM` is only about 7% slower than master, I guess it can be furthermore 
accelerated via native blas impl. But it need a big buffer (**m*n**), I think 
it somewhat dangerous; maybe we can split a block (whose size is optimized for 
`crossJoin`) into sub-blocks (whose size is optimized for `gemm`) to reduce 
this buffer, but I think it will be too convoluted;
   2, Compared with `DOT` based impls, `GEMV` should be a nice choice. It is 
much more faster (even with `f2jBLAS`), and the buffer size is relative small 
(**n**);
   3, 
[Guava.Ordering](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Ordering.java#L723)
 is much faster than `BoundedPriorityQueue`. With Guava.Ordering, we do not 
need to create `Tuple2` objects.
   
   Above tests are done locally, since I do not have a clean cluster for now.
   And only `f2jBLAS` is used, since after upgrading to Ubuntu 20.04, I fail to 
link netlib-java to native impls for now.
   
   
   friendly ping @srowen @MLnick @mpjlu @jkbradley @mengxr @WeichenXu123, 
because of your comments in previous prs 
(https://github.com/apache/spark/pull/17742, 
https://github.com/apache/spark/pull/17845, 
https://github.com/apache/spark/pull/18624)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to