[racket-dev] speeding up 16-bit integer adds

2010-09-23 Thread John Clements
I'm trying to add together big buffers. The following code creates two big fat 
buffers of 16-bit integers, and adds them together destructively. It looks to 
me like this code *could* run really fast, but it doesn't; this takes about 8.5 
seconds. Changing + to unsafe-fx+ has no detectable effect.  Is there 
allocation going on in the inner loop? I'd hoped that since an _sint16 fits 
safely in 31 bits, that no memory would be allocated in the inner loop. Grr! 
Any suggestions? (I ran a similar test on floats, and C ran about 64x faster, 
about a tenth of a second).

Doc pointers appreciated as always,

John

#lang racket 

(require ffi/unsafe)

(define (make-buffer-of-small-random-ints len)
  (let ([buf (malloc _sint16 len)])
(for ([i (in-range len)])
  (ptr-set! buf _sint16 i 73))
buf))

(define buf-len (* 44100 2 200))

(define b1 (make-buffer-of-small-random-ints buf-len))
(define b2 (make-buffer-of-small-random-ints buf-len))

(time
 (for ([i (in-range buf-len)])
   (ptr-set! b1 _sint16 i 
 (+ (ptr-ref b1 _sint16 i)
(ptr-ref b2 _sint16 i)

smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] speeding up 16-bit integer adds

2010-09-23 Thread Matthew Flatt
I think the problem is that the `ptr-ref' and `ptr-set!' operations are
slow. They are slow because they not yet inlined by the JIT, and
they're not yet inlined because they have complicated APIs (including a
pointer datatype with many variants).

I haven't worked out a way to make them faster or a way to provide
faster variants, but it's on my list.

At Thu, 23 Sep 2010 19:42:15 -0700, John Clements wrote:
 I'm trying to add together big buffers. The following code creates two big 
 fat 
 buffers of 16-bit integers, and adds them together destructively. It looks to 
 me like this code *could* run really fast, but it doesn't; this takes about 
 8.5 seconds. Changing + to unsafe-fx+ has no detectable effect.  Is there 
 allocation going on in the inner loop? I'd hoped that since an _sint16 fits 
 safely in 31 bits, that no memory would be allocated in the inner loop. Grr! 
 Any suggestions? (I ran a similar test on floats, and C ran about 64x faster, 
 about a tenth of a second).
 
 Doc pointers appreciated as always,
 
 John
 
 #lang racket 
 
 (require ffi/unsafe)
 
 (define (make-buffer-of-small-random-ints len)
   (let ([buf (malloc _sint16 len)])
 (for ([i (in-range len)])
   (ptr-set! buf _sint16 i 73))
 buf))
 
 (define buf-len (* 44100 2 200))
 
 (define b1 (make-buffer-of-small-random-ints buf-len))
 (define b2 (make-buffer-of-small-random-ints buf-len))
 
 (time
  (for ([i (in-range buf-len)])
(ptr-set! b1 _sint16 i 
  (+ (ptr-ref b1 _sint16 i)
 (ptr-ref b2 _sint16 i)

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] speeding up 16-bit integer adds

2010-09-23 Thread John Clements

On Sep 23, 2010, at 8:16 PM, Matthew Flatt wrote:

 One more thought: Do you get to pick whether you use 16-bit integers or
 64-bit floating-point numbers? The `flvector-' and `f64vector-'
 operations are inlined by the JIT and recognized for unboxing, so using
 flonum vectors and operations could be much faster than using raw
 pointers and 16-bit integers.

Well, that's an option, albeit a somewhat unappetizing one; as the 44100 in my 
code no doubt signaled, I'm reading and writing sound data here, and both 
16-bit ints and 32-bit floats are fairly common. 64-bit floats will be another 
factor of 2 in memory, for a total of 42 megabytes per minute.

I ran some tests, using flvectors and unsafe operations everywhere. (Code 
below.)

My tests called for 400 seconds of audio, or 282 Megabytes, and this made 
DrRacket flustered.  Restarting and running with half that size yielded (quite 
variable) times between 1 and 3 seconds, so that appears about twice as fast as 
the fixed-point one.

I'm tempted to write a little C code, but then of course I have to compile it 
separately for every darn platform.

Thanks again for your help,

John


#lang racket

(require ffi/unsafe
 racket/flonum
 racket/unsafe/ops)

(define (make-buffer-of-small-randoms len)
  (let ([buf (make-flvector len)])
(for ([i (in-range len)])
  (unsafe-flvector-set! buf i 0.73))
buf))

(define buf-len (* 44100 2 100))

(define b1 (make-buffer-of-small-randoms buf-len))
(define b2 (make-buffer-of-small-randoms buf-len))

(time
 (for ([i (in-range buf-len)])
   (unsafe-flvector-set! b1 i
 (unsafe-fl+ (unsafe-flvector-ref b1 i)
 (unsafe-flvector-ref b2 i)

smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] speeding up 16-bit integer adds

2010-09-23 Thread John Clements

On Sep 23, 2010, at 9:46 PM, John Clements wrote:

 
 On Sep 23, 2010, at 8:16 PM, Matthew Flatt wrote:
 
 One more thought: Do you get to pick whether you use 16-bit integers or
 64-bit floating-point numbers? The `flvector-' and `f64vector-'
 operations are inlined by the JIT and recognized for unboxing, so using
 flonum vectors and operations could be much faster than using raw
 pointers and 16-bit integers.
 
 Well, that's an option, albeit a somewhat unappetizing one; as the 44100 in 
 my code no doubt signaled, I'm reading and writing sound data here, and both 
 16-bit ints and 32-bit floats are fairly common. 64-bit floats will be 
 another factor of 2 in memory, for a total of 42 megabytes per minute.
 
 I ran some tests, using flvectors and unsafe operations everywhere. (Code 
 below.)

Update before going to bed; re-running the C tests with doubles everywhere and 
the same setup (simply adding together two big buffers) took about half a 
second, so in fact in this instance Racket is less that 10x slower, which is as 
fast as I would expect it to be.  So basically, it sounds like the flvectors 
are the way to go, if I can stomach the memory usage.

Thanks again,

John



smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev