On 01/21/2013 04:33 PM, Berthold Bäuml wrote:
I just did that. Here are the types:
real-matrix* : (Array Real) (Array Real) -> (Array Real)
flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
flmatrix* : FlArray FlArray -> FlArray
Results so far, measured in DrRacket with debugging off:
Function Size Time
-----------------------------------------
matrix* 100x100 340ms
real-matrix* 100x100 40ms
flonum-matrix* 100x100 10ms
flmatrix* 100x100 6ms
matrix* 1000x1000 418000ms
real-matrix* 1000x1000 76000ms
flonum-matrix* 1000x1000 7000ms
flmatrix* 1000x1000 4900ms
The only difference between `real-matrix*' and `flonum-matrix*' is that the
former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can
inline `real-matrix*', TR's optimizer will change the former to the latter,
making `flonum-matrix*' unnecessary. (FWIW, this would be the largest speedup
TR's optimizer will have ever shown me.)
It looks like the biggest speedup comes from doing only flonum ops in the inner
loop sum, which keeps all the intermediate flonums unboxed (i.e. not
heap-allocated or later garbage-collected).
Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it up
further. I won't yet, because I haven't settled on a type for matrices of
unboxed flonums. The type has to work with LAPACK if it's installed, which
`FlArray' doesn't do because its data layout is row-major and LAPACK expects
column-major.
I'll change `matrix*' to look like `real-matrix*'. It won't give the very best
performance, but it's a 60x speedup for 1000x1000 matrices.
These results look very promising, esp. if , as you mentioned, in the end the
real-matrix* will automatically reach the flonum-matrix* performance for
Flonums and the flmatrix* automatically switches to a LAPACK based variant,
when available. For the latter it would be great if one could even change the
used library to, e.g., redirect to a installation of the highly efficient MKL
library from Intel.
Looking forward to benchmark it against Numpy and Mathematica (which is MKL
based) again!
The next release candidate will have a `matrix*' that runs as fast as
`flonum-matrix*' on (Matrix Flonum) arguments, and as fast as
`real-matrix*' on (Matrix Real) arguments.
(Curiously, I wasn't able to speed up `flmatrix*' at all. The "a bit
stupidly" implementation was actually the fastest I could do, which
probably means it's as fast as it can be done in Racket without dropping
to C.)
Additionally, all the matrix functions will have more precise types to
make it easy for TR to prove that operations on (Matrix Flonum) values
yield (Matrix Flonum) values. (Currently, it only proves (Matrix Real)
and (Matrix Number).) Example:
> (:print-type (matrix-qr (matrix [[1.0 2.0] [3.0 4.0]])))
(Values (Array Flonum) (Array Flonum))
> (:print-type (matrix-qr (matrix [[1.0+0.0i 2.0+0.0i]
[3.0+0.0i 4.0+0.0i]])))
(Values (Array Float-Complex) (Array Float-Complex))
The current release candidate returns two (Array Real) and (Array
Number) instead.
Since I've brought up decompositions, you should know that these,
`matrix-inverse', `matrix-solve', and pretty much every O(n^3) operation
except multiplication will be slow on large matrices. I can't think of a
way to inline them in a way that TR can optimize. For fast versions,
which will use LAPACK if it's installed, you'll have to wait for 5.3.3. :/
Is the MKL library binary-compatible with LAPACK?
Neil ⊥
____________________
Racket Users list:
http://lists.racket-lang.org/users