luhenry commented on pull request #30810: URL: https://github.com/apache/spark/pull/30810#issuecomment-748687656
I updated the PR to depend on the package `dev.ludovic.vectorizedblas-blas` instead as it makes it a lot easier for me to go and evolve the algorithms independently of Spark, and for the integration to the build system. Let me know if that compromise of good for you. As for the latest results, it's looking much better: ``` [info] f2jBLAS = com.github.fommil.netlib.F2jBLAS [info] vectorBLAS = dev.ludovic.blas.VectorizedBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 45 45 1 223.7 4.5 1.0X [info] vector 26 26 3 389.4 2.6 1.7X [info] [info] Unknown processor [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 53 53 1 190.4 5.3 1.0X [info] vector 16 17 1 607.4 1.6 3.2X [info] [info] Unknown processor [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 73 73 1 137.3 7.3 1.0X [info] vector 35 36 3 282.8 3.5 2.1X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 36 36 1 279.5 3.6 1.0X [info] vector 21 21 0 481.4 2.1 1.7X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 35 36 0 0.0 35458.6 1.0X [info] vector 23 23 0 0.0 23415.6 1.5X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] f2j 275 276 1 0.4 2753.6 1.0X [info] vector 149 166 169 0.7 1488.5 1.8X ``` For much more detailed performance numbers on x86 (w/ AVX-2), I'm currently running a JMH benchmark covering more cases. I'll link to it as soon as it finishes (by tomorrow morning CET). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
