Re: [racket-dev] speeding up 16-bit integer adds
On Fri, Sep 24, 2010 at 3:42 AM, John Clements wrote: the inner loop. Grr! Any suggestions? Inline assembly? It works and is easy to do -- you'll need to extend http://github.com/noelwelsh/assembler/ with jumps. I'm serious. N. _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev
Re: [racket-dev] speeding up 16-bit integer adds
On Sep 23, 2010, at 9:46 PM, John Clements wrote: > > On Sep 23, 2010, at 8:16 PM, Matthew Flatt wrote: > >> One more thought: Do you get to pick whether you use 16-bit integers or >> 64-bit floating-point numbers? The `flvector-' and `f64vector-' >> operations are inlined by the JIT and recognized for unboxing, so using >> flonum vectors and operations could be much faster than using raw >> pointers and 16-bit integers. > > Well, that's an option, albeit a somewhat unappetizing one; as the 44100 in > my code no doubt signaled, I'm reading and writing sound data here, and both > 16-bit ints and 32-bit floats are fairly common. 64-bit floats will be > another factor of 2 in memory, for a total of 42 megabytes per minute. > > I ran some tests, using flvectors and unsafe operations everywhere. (Code > below.) Update before going to bed; re-running the C tests with doubles everywhere and the same setup (simply adding together two big buffers) took about half a second, so in fact in this instance Racket is less that 10x slower, which is as fast as I would expect it to be. So basically, it sounds like the flvectors are the way to go, if I can stomach the memory usage. Thanks again, John smime.p7s Description: S/MIME cryptographic signature _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev
Re: [racket-dev] speeding up 16-bit integer adds
On Sep 23, 2010, at 8:16 PM, Matthew Flatt wrote: > One more thought: Do you get to pick whether you use 16-bit integers or > 64-bit floating-point numbers? The `flvector-' and `f64vector-' > operations are inlined by the JIT and recognized for unboxing, so using > flonum vectors and operations could be much faster than using raw > pointers and 16-bit integers. Well, that's an option, albeit a somewhat unappetizing one; as the 44100 in my code no doubt signaled, I'm reading and writing sound data here, and both 16-bit ints and 32-bit floats are fairly common. 64-bit floats will be another factor of 2 in memory, for a total of 42 megabytes per minute. I ran some tests, using flvectors and unsafe operations everywhere. (Code below.) My tests called for 400 seconds of audio, or 282 Megabytes, and this made DrRacket flustered. Restarting and running with half that size yielded (quite variable) times between 1 and 3 seconds, so that appears about twice as fast as the fixed-point one. I'm tempted to write a little C code, but then of course I have to compile it separately for every darn platform. Thanks again for your help, John #lang racket (require ffi/unsafe racket/flonum racket/unsafe/ops) (define (make-buffer-of-small-randoms len) (let ([buf (make-flvector len)]) (for ([i (in-range len)]) (unsafe-flvector-set! buf i 0.73)) buf)) (define buf-len (* 44100 2 100)) (define b1 (make-buffer-of-small-randoms buf-len)) (define b2 (make-buffer-of-small-randoms buf-len)) (time (for ([i (in-range buf-len)]) (unsafe-flvector-set! b1 i (unsafe-fl+ (unsafe-flvector-ref b1 i) (unsafe-flvector-ref b2 i) smime.p7s Description: S/MIME cryptographic signature _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev
Re: [racket-dev] speeding up 16-bit integer adds
On Sep 23, 2010, at 7:55 PM, Matthew Flatt wrote: > I think the problem is that the `ptr-ref' and `ptr-set!' operations are > slow. They are slow because they not yet inlined by the JIT, and > they're not yet inlined because they have complicated APIs (including a > "pointer" datatype with many variants). > > I haven't worked out a way to make them faster or a way to provide > faster variants, but it's on my list. Okay, thanks. FWIW, my attempt to use the s16vector variants performs similarly; perhaps these primitives call the same code. John Clements smime.p7s Description: S/MIME cryptographic signature _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev
Re: [racket-dev] speeding up 16-bit integer adds
One more thought: Do you get to pick whether you use 16-bit integers or 64-bit floating-point numbers? The `flvector-' and `f64vector-' operations are inlined by the JIT and recognized for unboxing, so using flonum vectors and operations could be much faster than using raw pointers and 16-bit integers. At Thu, 23 Sep 2010 19:42:15 -0700, John Clements wrote: > I'm trying to add together big buffers. The following code creates two big > fat > buffers of 16-bit integers, and adds them together destructively. It looks to > me like this code *could* run really fast, but it doesn't; this takes about > 8.5 seconds. Changing + to unsafe-fx+ has no detectable effect. Is there > allocation going on in the inner loop? I'd hoped that since an _sint16 fits > safely in 31 bits, that no memory would be allocated in the inner loop. Grr! > Any suggestions? (I ran a similar test on floats, and C ran about 64x faster, > about a tenth of a second). > > Doc pointers appreciated as always, > > John > > #lang racket > > (require ffi/unsafe) > > (define (make-buffer-of-small-random-ints len) > (let ([buf (malloc _sint16 len)]) > (for ([i (in-range len)]) > (ptr-set! buf _sint16 i 73)) > buf)) > > (define buf-len (* 44100 2 200)) > > (define b1 (make-buffer-of-small-random-ints buf-len)) > (define b2 (make-buffer-of-small-random-ints buf-len)) > > (time > (for ([i (in-range buf-len)]) >(ptr-set! b1 _sint16 i > (+ (ptr-ref b1 _sint16 i) > (ptr-ref b2 _sint16 i) > -- > [application/#f "smime.p7s"] [~/Desktop & open] [~/Temp & open] > _ > For list-related administrative tasks: > http://lists.racket-lang.org/listinfo/dev _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev
Re: [racket-dev] speeding up 16-bit integer adds
I think the problem is that the `ptr-ref' and `ptr-set!' operations are slow. They are slow because they not yet inlined by the JIT, and they're not yet inlined because they have complicated APIs (including a "pointer" datatype with many variants). I haven't worked out a way to make them faster or a way to provide faster variants, but it's on my list. At Thu, 23 Sep 2010 19:42:15 -0700, John Clements wrote: > I'm trying to add together big buffers. The following code creates two big > fat > buffers of 16-bit integers, and adds them together destructively. It looks to > me like this code *could* run really fast, but it doesn't; this takes about > 8.5 seconds. Changing + to unsafe-fx+ has no detectable effect. Is there > allocation going on in the inner loop? I'd hoped that since an _sint16 fits > safely in 31 bits, that no memory would be allocated in the inner loop. Grr! > Any suggestions? (I ran a similar test on floats, and C ran about 64x faster, > about a tenth of a second). > > Doc pointers appreciated as always, > > John > > #lang racket > > (require ffi/unsafe) > > (define (make-buffer-of-small-random-ints len) > (let ([buf (malloc _sint16 len)]) > (for ([i (in-range len)]) > (ptr-set! buf _sint16 i 73)) > buf)) > > (define buf-len (* 44100 2 200)) > > (define b1 (make-buffer-of-small-random-ints buf-len)) > (define b2 (make-buffer-of-small-random-ints buf-len)) > > (time > (for ([i (in-range buf-len)]) >(ptr-set! b1 _sint16 i > (+ (ptr-ref b1 _sint16 i) > (ptr-ref b2 _sint16 i) _ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev