On 22.01.2013, at 06:19, Neil Toronto <neil.toro...@gmail.com> wrote:

> On 01/21/2013 04:33 PM, Berthold Bäuml wrote:
>> 
>>> 
>>> I just did that. Here are the types:
>>> 
>>>  real-matrix* : (Array Real) (Array Real) -> (Array Real)
>>> 
>>>  flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
>>> 
>>>  flmatrix* : FlArray FlArray -> FlArray
>>> 
>>> Results so far, measured in DrRacket with debugging off:
>>> 
>>> Function           Size              Time
>>> -----------------------------------------
>>> matrix*            100x100          340ms
>>> real-matrix*       100x100           40ms
>>> flonum-matrix*     100x100           10ms
>>> flmatrix*          100x100            6ms
>>> 
>>> matrix*           1000x1000      418000ms
>>> real-matrix*      1000x1000       76000ms
>>> flonum-matrix*    1000x1000        7000ms
>>> flmatrix*         1000x1000        4900ms
>>> 
>>> The only difference between `real-matrix*' and `flonum-matrix*' is that the 
>>> former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can 
>>> inline `real-matrix*', TR's optimizer will change the former to the latter, 
>>> making `flonum-matrix*' unnecessary. (FWIW, this would be the largest 
>>> speedup TR's optimizer will have ever shown me.)
>>> 
>>> It looks like the biggest speedup comes from doing only flonum ops in the 
>>> inner loop sum, which keeps all the intermediate flonums unboxed (i.e. not 
>>> heap-allocated or later garbage-collected).
>>> 
>>> Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it 
>>> up further. I won't yet, because I haven't settled on a type for matrices 
>>> of unboxed flonums. The type has to work with LAPACK if it's installed, 
>>> which `FlArray' doesn't do because its data layout is row-major and LAPACK 
>>> expects column-major.
>>> 
>>> I'll change `matrix*' to look like `real-matrix*'. It won't give the very 
>>> best performance, but it's a 60x speedup for 1000x1000 matrices.
>>> 
>> 
>> These results look very promising, esp. if , as you mentioned, in the end 
>> the real-matrix* will automatically reach the flonum-matrix* performance for 
>> Flonums and the flmatrix* automatically switches to a LAPACK based variant, 
>> when available. For the latter it would be great if one could even change 
>> the used library to, e.g., redirect to a installation of the highly 
>> efficient MKL library from Intel.
>> 
>> Looking forward to benchmark it against Numpy and Mathematica (which is MKL 
>> based) again!
> 
> The next release candidate will have a `matrix*' that runs as fast as 
> `flonum-matrix*' on (Matrix Flonum) arguments, and as fast as `real-matrix*' 
> on (Matrix Real) arguments.

Cool!

> (Curiously, I wasn't able to speed up `flmatrix*' at all. The "a bit 
> stupidly" implementation was actually the fastest I could do, which probably 
> means it's as fast as it can be done in Racket without dropping to C.)
> 
> Additionally, all the matrix functions will have more precise types to make 
> it easy for TR to prove that operations on (Matrix Flonum) values yield 
> (Matrix Flonum) values. (Currently, it only proves (Matrix Real) and (Matrix 
> Number).) Example:
> 
>  > (:print-type (matrix-qr (matrix [[1.0 2.0] [3.0 4.0]])))
>  (Values (Array Flonum) (Array Flonum))
>  > (:print-type (matrix-qr (matrix [[1.0+0.0i 2.0+0.0i]
>                                     [3.0+0.0i 4.0+0.0i]])))
>  (Values (Array Float-Complex) (Array Float-Complex))
> 
> The current release candidate returns two (Array Real) and (Array Number) 
> instead.
> 
> Since I've brought up decompositions, you should know that these, 
> `matrix-inverse', `matrix-solve', and pretty much every O(n^3) operation 
> except multiplication will be slow on large matrices. I can't think of a way 
> to inline them in a way that TR can optimize. For fast versions, which will 
> use LAPACK if it's installed, you'll have to wait for 5.3.3. :/

Will the LAPACK bindings be in the git head sooner, maybe ;-)
> 

> Is the MKL library binary-compatible with LAPACK?
> 

I did not try it myself yet, but the Intel documentation does say so:
"If your application already relies on the BLAS or LAPACK functionality, simply 
re-link with Intel® MKL to get better performance on Intel and compatible 
architectures."
http://software.intel.com/en-us/intel-mkl/

Berthold


> Neil ⊥
> 



____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to