Hi Eric, You were absolute right. The version below cuts the time in half. It is mostly cut and paste from existing functions and removing non-Flonum cases.
/Jens Axel #lang typed/racket (require math/matrix math/array math/private/matrix/utils math/private/vector/vector-mutate math/private/unsafe (only-in racket/unsafe/ops unsafe-fl/) racket/fixnum racket/flonum racket/list) (define-type Pivoting (U 'first 'partial)) (: flonum-matrix-gauss-elim (case-> ((Matrix Flonum) -> (Values (Matrix Flonum) (Listof Index))) ((Matrix Flonum) Any -> (Values (Matrix Flonum) (Listof Index))) ((Matrix Flonum) Any Any -> (Values (Matrix Flonum) (Listof Index))) ((Matrix Flonum) Any Any Pivoting -> (Values (Matrix Flonum) (Listof Index))))) (define (flonum-matrix-gauss-elim M [jordan? #f] [unitize-pivot? #f] [pivoting 'partial]) (define-values (m n) (matrix-shape M)) (define rows (matrix->vector* M)) (let loop ([#{i : Nonnegative-Fixnum} 0] [#{j : Nonnegative-Fixnum} 0] [#{without-pivot : (Listof Index)} empty]) (cond [(j . fx>= . n) (values (vector*->matrix rows) (reverse without-pivot))] [(i . fx>= . m) (values (vector*->matrix rows) ;; None of the rest of the columns can have pivots (let loop ([#{j : Nonnegative-Fixnum} j] [without-pivot without-pivot]) (cond [(j . fx< . n) (loop (fx+ j 1) (cons j without-pivot))] [else (reverse without-pivot)])))] [else (define-values (p pivot) (case pivoting [(partial) (find-partial-pivot rows m i j)] [(first) (find-first-pivot rows m i j)])) (cond [(zero? pivot) (loop i (fx+ j 1) (cons j without-pivot))] [else ;; Swap pivot row with current (vector-swap! rows i p) ;; Possibly unitize the new current row (let ([pivot (if unitize-pivot? (begin (vector-scale! (unsafe-vector-ref rows i) (unsafe-fl/ 1. pivot)) (unsafe-fl/ pivot pivot)) pivot)]) (flonum-elim-rows! rows m i j pivot (if jordan? 0 (fx+ i 1))) (loop (fx+ i 1) (fx+ j 1) without-pivot))])]))) (: flonum-elim-rows! ((Vectorof (Vectorof Flonum)) Index Index Index Flonum Nonnegative-Fixnum -> Void)) (define (flonum-elim-rows! rows m i j pivot start) (define row_i (unsafe-vector-ref rows i)) (let loop ([#{l : Nonnegative-Fixnum} start]) (when (l . fx< . m) (unless (l . fx= . i) (define row_l (unsafe-vector-ref rows l)) (define x_lj (unsafe-vector-ref row_l j)) (unless (= x_lj 0) (flonum-vector-scaled-add! row_l row_i (fl* -1. (fl/ x_lj pivot)) j) ;; Make sure the element below the pivot is zero (unsafe-vector-set! row_l j (- x_lj x_lj)))) (loop (fx+ l 1))))) (: flonum-matrix-solve (All (A) (case-> ((Matrix Flonum) (Matrix Flonum) -> (Matrix Flonum)) ((Matrix Flonum) (Matrix Flonum) (-> A) -> (U A (Matrix Flonum)))))) (define flonum-matrix-solve (case-lambda [(M B) (flonum-matrix-solve M B (λ () (raise-argument-error 'flonum-matrix-solve "matrix-invertible?" 0 M B)))] [(M B fail) (define m (square-matrix-size M)) (define-values (s t) (matrix-shape B)) (cond [(= m s) (define-values (IX wps) (parameterize ([array-strictness #f]) (flonum-matrix-gauss-elim (matrix-augment (list M B)) #t #t))) (cond [(and (not (empty? wps)) (= (first wps) m)) (submatrix IX (::) (:: m #f))] [else (fail)])] [else (error 'flonum-matrix-solve "matrices must have the same number of rows; given ~e and ~e" M B)])])) (define-syntax-rule (flonum-vector-generic-scaled-add! vs0-expr vs1-expr v-expr start-expr + *) (let* ([vs0 vs0-expr] [vs1 vs1-expr] [v v-expr] [n (fxmin (vector-length vs0) (vector-length vs1))]) (let loop ([#{i : Nonnegative-Fixnum} (fxmin start-expr n)]) (if (i . fx< . n) (begin (unsafe-vector-set! vs0 i (+ (unsafe-vector-ref vs0 i) (* (unsafe-vector-ref vs1 i) v))) (loop (fx+ i 1))) (void))))) (: flonum-vector-scaled-add! (case-> ((Vectorof Flonum) (Vectorof Flonum) Flonum -> Void) ((Vectorof Flonum) (Vectorof Flonum) Flonum Index -> Void))) (define (flonum-vector-scaled-add! vs0 vs1 s [start 0]) (flonum-vector-generic-scaled-add! vs0 vs1 s start + *)) (: mx Index) (define mx 600) (: r (Index Index -> Flonum)) (define (r i j) (random)) (: A : (Matrix Flonum)) (define A (build-matrix mx mx r)) (: sum : Integer Integer -> Flonum) (define (sum i n) (let loop ((j 0) (acc 0.0)) (if (>= j mx) acc (loop (+ j 1) (+ acc (matrix-ref A i j))) ))) (: b : (Matrix Flonum)) (define b (build-matrix mx 1 sum)) (time (let [(m (flonum-matrix-solve A b))] (matrix-ref m 0 0))) (time (let [(m (matrix-solve A b))] (matrix-ref m 0 0))) (time (let [(m (flonum-matrix-solve A b))] (matrix-ref m 0 0))) (time (let [(m (matrix-solve A b))] (matrix-ref m 0 0))) (time (let [(m (flonum-matrix-solve A b))] (matrix-ref m 0 0))) (time (let [(m (matrix-solve A b))] (matrix-ref m 0 0))) /Jens Axel 2014-05-11 23:26 GMT+02:00 Eric Dobson <eric.n.dob...@gmail.com>: > Where is the time spent in the algorithm? I assume that most of it is > in the matrix manipulation work not the orchestration of finding a > pivot and reducing based on that. I.e. `elim-rows!` is the expensive > part. Given that you only specialized the outer part of the loop, I > wouldn't expect huge performance changes. > > On Sun, May 11, 2014 at 2:13 PM, Jens Axel Søgaard > <jensa...@soegaard.net> wrote: >> I tried restricting the matrix-solve and matrix-gauss-elim to (Matrix >> Flonum). >> I can't observe a change in the timings. >> >> #lang typed/racket >> (require math/matrix >> math/array >> math/private/matrix/utils >> math/private/vector/vector-mutate >> math/private/unsafe >> (only-in racket/unsafe/ops unsafe-fl/) >> racket/fixnum >> racket/list) >> >> (define-type Pivoting (U 'first 'partial)) >> >> (: flonum-matrix-gauss-elim >> (case-> ((Matrix Flonum) -> (Values (Matrix Flonum) (Listof Index))) >> ((Matrix Flonum) Any -> (Values (Matrix Flonum) (Listof Index))) >> ((Matrix Flonum) Any Any -> (Values (Matrix Flonum) (Listof >> Index))) >> ((Matrix Flonum) Any Any Pivoting -> (Values (Matrix >> Flonum) (Listof Index))))) >> (define (flonum-matrix-gauss-elim M [jordan? #f] [unitize-pivot? #f] >> [pivoting 'partial]) >> (define-values (m n) (matrix-shape M)) >> (define rows (matrix->vector* M)) >> (let loop ([#{i : Nonnegative-Fixnum} 0] >> [#{j : Nonnegative-Fixnum} 0] >> [#{without-pivot : (Listof Index)} empty]) >> (cond >> [(j . fx>= . n) >> (values (vector*->matrix rows) >> (reverse without-pivot))] >> [(i . fx>= . m) >> (values (vector*->matrix rows) >> ;; None of the rest of the columns can have pivots >> (let loop ([#{j : Nonnegative-Fixnum} j] [without-pivot >> without-pivot]) >> (cond [(j . fx< . n) (loop (fx+ j 1) (cons j >> without-pivot))] >> [else (reverse without-pivot)])))] >> [else >> (define-values (p pivot) >> (case pivoting >> [(partial) (find-partial-pivot rows m i j)] >> [(first) (find-first-pivot rows m i j)])) >> (cond >> [(zero? pivot) (loop i (fx+ j 1) (cons j without-pivot))] >> [else >> ;; Swap pivot row with current >> (vector-swap! rows i p) >> ;; Possibly unitize the new current row >> (let ([pivot (if unitize-pivot? >> (begin (vector-scale! (unsafe-vector-ref rows i) >> (unsafe-fl/ 1. pivot)) >> (unsafe-fl/ pivot pivot)) >> pivot)]) >> (elim-rows! rows m i j pivot (if jordan? 0 (fx+ i 1))) >> (loop (fx+ i 1) (fx+ j 1) without-pivot))])]))) >> >> (: flonum-matrix-solve >> (All (A) (case-> >> ((Matrix Flonum) (Matrix Flonum) -> (Matrix Flonum)) >> ((Matrix Flonum) (Matrix Flonum) (-> A) -> (U A (Matrix >> Flonum)))))) >> (define flonum-matrix-solve >> (case-lambda >> [(M B) (flonum-matrix-solve >> M B (λ () (raise-argument-error 'flonum-matrix-solve >> "matrix-invertible?" 0 M B)))] >> [(M B fail) >> (define m (square-matrix-size M)) >> (define-values (s t) (matrix-shape B)) >> (cond [(= m s) >> (define-values (IX wps) >> (parameterize ([array-strictness #f]) >> (flonum-matrix-gauss-elim (matrix-augment (list M B)) #t >> #t))) >> (cond [(and (not (empty? wps)) (= (first wps) m)) >> (submatrix IX (::) (:: m #f))] >> [else (fail)])] >> [else >> (error 'flonum-matrix-solve >> "matrices must have the same number of rows; given ~e and >> ~e" >> M B)])])) >> >> (: mx Index) >> (define mx 600) >> >> (: r (Index Index -> Flonum)) >> (define (r i j) (random)) >> >> (: A : (Matrix Flonum)) >> (define A (build-matrix mx mx r)) >> >> (: sum : Integer Integer -> Flonum) >> (define (sum i n) >> (let loop ((j 0) (acc 0.0)) >> (if (>= j mx) acc >> (loop (+ j 1) (+ acc (matrix-ref A i j))) ))) >> >> (: b : (Matrix Flonum)) >> (define b (build-matrix mx 1 sum)) >> >> (time >> (let [(m (flonum-matrix-solve A b))] >> (matrix-ref m 0 0))) >> (time >> (let [(m (matrix-solve A b))] >> (matrix-ref m 0 0))) >> >> (time >> (let [(m (flonum-matrix-solve A b))] >> (matrix-ref m 0 0))) >> (time >> (let [(m (matrix-solve A b))] >> (matrix-ref m 0 0))) >> >> (time >> (let [(m (flonum-matrix-solve A b))] >> (matrix-ref m 0 0))) >> (time >> (let [(m (matrix-solve A b))] >> (matrix-ref m 0 0))) >> >> 2014-05-11 21:48 GMT+02:00 Neil Toronto <neil.toro...@gmail.com>: >>> The garbage collection time is probably from cleaning up boxed flonums, and >>> possibly intermediate vectors. If so, a separate implementation of Gaussian >>> elimination for the FlArray type would cut the GC time to nearly zero. >>> >>> Neil ⊥ >>> >>> >>> On 05/11/2014 01:36 PM, Jens Axel Søgaard wrote: >>>> >>>> Or ... you could take a look at >>>> >>>> >>>> https://github.com/plt/racket/blob/master/pkgs/math-pkgs/math-lib/math/private/matrix/matrix-gauss-elim.rkt >>>> >>>> at see if something can be improved. >>>> >>>> /Jens Axel >>>> >>>> >>>> 2014-05-11 21:30 GMT+02:00 Jens Axel Søgaard <jensa...@soegaard.net>: >>>>> >>>>> Hi Eduardo, >>>>> >>>>> The math/matrix library uses the arrays from math/array to represent >>>>> matrices. >>>>> >>>>> If you want to try the same representation as Bigloo, you could try Will >>>>> Farr's >>>>> matrix library: >>>>> >>>>> >>>>> http://planet.racket-lang.org/package-source/wmfarr/simple-matrix.plt/1/1/planet-docs/simple-matrix/index.html >>>>> >>>>> I am interested in hearing the results. >>>>> >>>>> /Jens Axel >>>>> >>>>> >>>>> >>>>> 2014-05-11 21:18 GMT+02:00 Eduardo Costa <edu50...@gmail.com>: >>>>>> >>>>>> What is bothering me is the time Racket is spending in garbage >>>>>> collection. >>>>>> >>>>>> ~/wrk/scm/rkt/matrix$ racket matrix.rkt >>>>>> 0.9999999999967226 >>>>>> cpu time: 61416 real time: 61214 gc time: 32164 >>>>>> >>>>>> If I am reading the output correctly, Racket is spending 32 seconds out >>>>>> of >>>>>> 61 seconds in garbage collection. >>>>>> >>>>>> I am following Junia Magellan's computer language comparison and I >>>>>> cannot >>>>>> understand why Racket needs the garbage collector for doing Gaussian >>>>>> elimination. In a slow Compaq/HP machine, solving a system of 800 linear >>>>>> equations takes 17.3 seconds in Bigloo, but requires 58 seconds in >>>>>> Racket, >>>>>> even after removing the building of the linear system from >>>>>> consideration. >>>>>> Common Lisp is also much faster than Racket in processing arrays. I >>>>>> would >>>>>> like to point out that Racket is very fast in general. The only occasion >>>>>> that it lags badly behind Common Lisp and Bigloo is when one needs to >>>>>> deal >>>>>> with arrays. >>>>>> >>>>>> Basically, Junia is using Rasch method to measure certain latent traits >>>>>> of >>>>>> computer languages, like productivity and coaching time. In any case, >>>>>> she >>>>>> needs to do a lot of matrix calculations to invert the Rasch model. >>>>>> Since >>>>>> Bigloo works with homogeneous vectors, she wrote a few macros to access >>>>>> the >>>>>> elements of a matrix: >>>>>> >>>>>> (define (mkv n) (make-f64vector n)) >>>>>> (define $ f64vector-ref) >>>>>> (define $! f64vector-set!) >>>>>> (define len f64vector-length) >>>>>> >>>>>> (define-syntax $$ >>>>>> (syntax-rules () >>>>>> (($$ m i j) (f64vector-ref (vector-ref m i) j)))) >>>>>> >>>>>> (define-syntax $$! >>>>>> (syntax-rules () >>>>>> (($$! matrix row column value) >>>>>> ($! (vector-ref matrix row) column value)))) >>>>>> >>>>>> I wonder whether homogeneous vectors would speed up Racket. In the same >>>>>> computer that Racket takes 80 seconds to build and invert a system of >>>>>> equations, Bigloo takes 17.3 seconds, as I told before. Common Lisp is >>>>>> even >>>>>> faster. However, if one subtracts the gc time from Racket's total time, >>>>>> the >>>>>> result comes quite close to Common Lisp or Bigloo. >>>>>> >>>>>> ~/wrk/bgl$ bigloo -Obench bigmat.scm -o big >>>>>> ~/wrk/bgl$ time ./big >>>>>> 0.9999999999965746 1.000000000000774 0.9999999999993039 >>>>>> 0.9999999999982576 >>>>>> 1.000000000007648 0.999999999996588 >>>>>> >>>>>> real 0m17.423s >>>>>> user 0m17.384s >>>>>> sys 0m0.032s >>>>>> ~/wrk/bgl$ >>>>>> >>>>>> Well, bigloo may perform global optimizations, but Common Lisp doesn't. >>>>>> When >>>>>> one is not dealing with matrices, Racket is faster than Common Lisp. I >>>>>> hope >>>>>> you can tell me how to rewrite the program in order to avoid garbage >>>>>> collection. >>>>>> >>>>>> By the way, you may want to know why not use Bigloo or Common Lisp to >>>>>> invert >>>>>> the Rasch model. The problem is that Junia and her co-workers are using >>>>>> hosting services that do not give access to the server or to the >>>>>> jailshell. >>>>>> Since Bigloo requires gcc based compilation, Junia discarded it right >>>>>> away. >>>>>> Not long ago, the hosting service stopped responding to the sbcl Common >>>>>> Lisp >>>>>> compiler for reasons that I cannot fathom. Although Racket 6.0 stopped >>>>>> working too, Racket 6.0.1 is working fine. This left Junia, her >>>>>> co-workers >>>>>> and students with Racket as their sole option. As for myself, I am just >>>>>> curious. >>>>>> >>>>>> >>>>>> 2014-05-11 6:23 GMT-03:00 Jens Axel Søgaard <jensa...@soegaard.net>: >>>>>> >>>>>>> 2014-05-11 6:09 GMT+02:00 Eduardo Costa <edu50...@gmail.com>: >>>>>>>> >>>>>>>> The documentation says that one should expect typed/racket to be >>>>>>>> faster >>>>>>>> than >>>>>>>> racket. I tested the math/matrix library and it seems to be almost as >>>>>>>> slow >>>>>>>> in typed/racket as in racket. >>>>>>> >>>>>>> >>>>>>> What was (is?) slow was a call in an untyped module A to a function >>>>>>> exported >>>>>>> from a typed module B. The functions in B must check at runtime that >>>>>>> the values coming from A are of the correct type. If the A was written >>>>>>> in Typed Racket, the types would be known at compile time. >>>>>>> >>>>>>> Here math/matrix is written in Typed Racket, so if you are writing an >>>>>>> untyped module, you will in general want to minimize the use of,say, >>>>>>> maxtrix-ref. Instead operations that works on entire matrices or >>>>>>> row/columns are preferred. >>>>>>> >>>>>>>> (: sum : Integer Integer -> Flonum) >>>>>>>> (define (sum i n) >>>>>>>> (let loop ((j 0) (acc 0.0)) >>>>>>>> (if (>= j mx) acc >>>>>>>> (loop (+ j 1) (+ acc (matrix-ref A i j))) ))) >>>>>>>> >>>>>>>> (: b : (Matrix Flonum)) >>>>>>>> (define b (build-matrix mx 1 sum)) >>>>>>> >>>>>>> >>>>>>> The matrix b contains the sums of each row in the matrix. >>>>>>> Since matrices are a subset of arrays, you can use array-axis-sum, >>>>>>> which computes sum along a given axis (i.e. a row or a column when >>>>>>> speaking of matrices). >>>>>>> >>>>>>> (define A (matrix [[0. 1. 2.] >>>>>>> [3. 4. 5.] >>>>>>> [6. 7. 8.]])) >>>>>>> >>>>>>>> (array-axis-sum A 1) >>>>>>> >>>>>>> - : (Array Flonum) >>>>>>> (array #[3.0 12.0 21.0]) >>>>>>> >>>>>>> However as Eric points out, matrix-solve is an O(n^3) algorithm, >>>>>>> so the majority of the time is spent in matrix-solve. >>>>>>> >>>>>>> Apart from finding a way to exploit the relationship between your >>>>>>> matrix A and the column vector b, I see no obvious way of >>>>>>> speeding up the code. >>>>>>> >>>>>>> Note that when you benchmark with >>>>>>> >>>>>>> time racket matrix.rkt >>>>>>> >>>>>>> you will include startup and compilation time. >>>>>>> Therefore if you want to time the matrix code, >>>>>>> insert a literal (time ...) call. >>>>>>> >>>>>>> -- >>>>>>> Jens Axel Søgaard >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Jens Axel Søgaard >>>> >>>> >>>> >>>> >>> >>> ____________________ >>> Racket Users list: >>> http://lists.racket-lang.org/users >> >> >> >> -- >> -- >> Jens Axel Søgaard >> >> ____________________ >> Racket Users list: >> http://lists.racket-lang.org/users -- -- Jens Axel Søgaard ____________________ Racket Users list: http://lists.racket-lang.org/users