> 
> I just did that. Here are the types:
> 
>  real-matrix* : (Array Real) (Array Real) -> (Array Real)
> 
>  flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
> 
>  flmatrix* : FlArray FlArray -> FlArray
> 
> Results so far, measured in DrRacket with debugging off:
> 
> Function           Size              Time
> -----------------------------------------
> matrix*            100x100          340ms
> real-matrix*       100x100           40ms
> flonum-matrix*     100x100           10ms
> flmatrix*          100x100            6ms
> 
> matrix*           1000x1000      418000ms
> real-matrix*      1000x1000       76000ms
> flonum-matrix*    1000x1000        7000ms
> flmatrix*         1000x1000        4900ms
> 
> The only difference between `real-matrix*' and `flonum-matrix*' is that the 
> former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can 
> inline `real-matrix*', TR's optimizer will change the former to the latter, 
> making `flonum-matrix*' unnecessary. (FWIW, this would be the largest speedup 
> TR's optimizer will have ever shown me.)
> 
> It looks like the biggest speedup comes from doing only flonum ops in the 
> inner loop sum, which keeps all the intermediate flonums unboxed (i.e. not 
> heap-allocated or later garbage-collected).
> 
> Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it up 
> further. I won't yet, because I haven't settled on a type for matrices of 
> unboxed flonums. The type has to work with LAPACK if it's installed, which 
> `FlArray' doesn't do because its data layout is row-major and LAPACK expects 
> column-major.
> 
> I'll change `matrix*' to look like `real-matrix*'. It won't give the very 
> best performance, but it's a 60x speedup for 1000x1000 matrices.
> 

These results look very promising, esp. if , as you mentioned, in the end the 
real-matrix* will automatically reach the flonum-matrix* performance for 
Flonums and the flmatrix* automatically switches to a LAPACK based variant, 
when available. For the latter it would be great if one could even change the 
used library to, e.g., redirect to a installation of the highly efficient MKL 
library from Intel.

Looking forward to benchmark it against Numpy and Mathematica (which is MKL 
based) again!

Berthold


____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to