Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-25 Thread Matthew Flatt
At Sat, 23 Jul 2011 14:42:15 -0400, John Clements wrote:
 This C code adds the content of one buffer to another one, with no checking.  
 The corresponding racket code runs about 10x slower. Do you folks think that 
 it 
 should be possible to do better? (One salient fact: these are 
 shorts--16-bit-ints--not 32-bit ints.)

My sources say no --- at least not a lot better with the current JIT.

 void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
   int i;
   for (i = 0; ilen; i++) {
 dst[i+dstOffset] += src[i+srcOffset];
   }}

One problem is that a `s16-vector' in Racket is a struct that wraps an
FFI wrapper around the array pointer, instead of just an array pointer.
So, a C version of the Racket loop is more like

 void addOn(short_ptr *dst, int dstOffset, 
short_ptr *src, int srcOffset, int len) {
   int i;
   for (i = 0; ilen; i++) {
 dst-a-a[i+dstOffset] += src-a-a[i+srcOffset];
   }
 }

Another problem is that the JIT doesn't use registers well enough, so
its code is closer in this case to unoptimized gcc output. Finally,
there's some extra overhead for the fixnum encoding.

Here are some timings for 1000 iterations on 2^20-element inputs
(32-bit mode, Mac Book Pro 2.53 GHz):

  C as above, gcc -02  :  1409
  C with indirections, gcc -O2 :  4041
  C as above, gcc -O0  :  6425
  C with indirections, gcc -O0 :  8480
  Racket fxvector (more direct):  8883
  Racket   : 11248

I can tweak the JIT in small ways to make a small improvement:

  Racket with JIT tweaks   : 10670


The Racket code I tried is below (ugly to run as fast as I could make
it).



#lang racket/base
(require racket/unsafe/ops
 ffi/vector)

(define SIZE 1048576)

(define addOn #f) ; defeat inlining of `addOn'
(set! addOn
  (lambda (dst dstOffset src srcOffset len)
(let loop ([i 0])
  (unless (eq? i len)
(let ([d (unsafe-fx+ dstOffset i)])
  (unsafe-s16vector-set! 
   dst 
   d
   (unsafe-fx+
(unsafe-s16vector-ref dst d)
(unsafe-s16vector-ref src (unsafe-fx+ srcOffset i)
(loop (unsafe-fx+ i 1))

(let ([a (make-s16vector SIZE)]
  [b (make-s16vector SIZE)])
  (time
   (for ([i (in-range 1000)])
 (addOn a 0 b 0 SIZE

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


[racket-dev] can we write these four lines of C in performant racket?

2011-07-23 Thread John Clements
This C code adds the content of one buffer to another one, with no checking.  
The corresponding racket code runs about 10x slower. Do you folks think that it 
should be possible to do better? (One salient fact: these are 
shorts--16-bit-ints--not 32-bit ints.)

John



void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
  int i;
  for (i = 0; ilen; i++) {
dst[i+dstOffset] += src[i+srcOffset];
  }}



smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-23 Thread Robby Findler
What is the data you're using to represent the shorts in Racket?

Robby

On Sat, Jul 23, 2011 at 1:42 PM, John Clements
cleme...@brinckerhoff.org wrote:
 This C code adds the content of one buffer to another one, with no checking.  
 The corresponding racket code runs about 10x slower. Do you folks think that 
 it should be possible to do better? (One salient fact: these are 
 shorts--16-bit-ints--not 32-bit ints.)

 John



 void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
  int i;
  for (i = 0; ilen; i++) {
    dst[i+dstOffset] += src[i+srcOffset];
  }}


 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-23 Thread John Clements

On Jul 23, 2011, at 2:46 PM, Robby Findler wrote:

 What is the data you're using to represent the shorts in Racket?

#s16vectors.


John

 
 Robby
 
 On Sat, Jul 23, 2011 at 1:42 PM, John Clements
 cleme...@brinckerhoff.org wrote:
 This C code adds the content of one buffer to another one, with no checking. 
  The corresponding racket code runs about 10x slower. Do you folks think 
 that it should be possible to do better? (One salient fact: these are 
 shorts--16-bit-ints--not 32-bit ints.)
 
 John
 
 
 
 void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
  int i;
  for (i = 0; ilen; i++) {
dst[i+dstOffset] += src[i+srcOffset];
  }}
 
 
 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev
 



smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev