[Rd] accelerating matrix multiply

Cohn, Robert S Sat, 07 Jan 2017 09:22:13 -0800

I am using R to multiply some large (30k x 30k double) matrices on a 64 core 
machine (xeon phi).  I added some timers to src/main/array.c to see where the 
time is going. All of the time is being spent in the matprod function, most of 
that time is spent in dgemm. 15 seconds is in matprod in some code that is 
checking if there are NaNs.


> system.time (C <- B %*% A)
nancheck: wall time 15.240282s
  dgemm: wall time 43.111064s
matprod: wall time 58.351572s
    user   system  elapsed 
2710.154   20.999   58.398

The NaN checking code is not being vectorized because of the early exit when 
NaN is detected:

        /* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
         * The test is only O(n) here.
         */
        for (R_xlen_t i = 0; i < NRX*ncx; i++)
            if (ISNAN(x[i])) {have_na = TRUE; break;}
        if (!have_na)
            for (R_xlen_t i = 0; i < NRY*ncy; i++)
                if (ISNAN(y[i])) {have_na = TRUE; break;}

I tried deleting the 'break'. By inspecting the asm code, I verified that the 
loop was not being vectorized before, but now is vectorized. Total time goes 
down:

system.time (C <- B %*% A)
nancheck: wall time 1.898667s
  dgemm: wall time 43.913621s
matprod: wall time 45.812468s
   user   system  elapsed 
2727.877   20.723   45.859

The break accelerates the case when there is a NaN, at the expense of the much 
more common case when there isn't a NaN. If a NaN is detected, it doesn't call 
dgemm and calls its own matrix multiply, which makes the NaN check time 
insignificant so I doubt the early exit provides any benefit.

I was a little surprised that the O(n) NaN check is costly compared to the 
O(n**2) dgemm that follows. I think the reason is that nan check is single 
thread and not vectorized, and my machine can do 2048 floating point ops/cycle 
when you consider the cores/dual issue/8 way SIMD/muladd, and the constant 
factor will be significant for even large matrices.

Would you consider deleting the breaks? I can submit a patch if that will help. 
Thanks.

Robert

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] accelerating matrix multiply

Reply via email to