Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-26 Thread Matthew Flatt
At Mon, 25 Jul 2011 08:05:46 -0400, Sam Tobin-Hochstadt wrote:
> On Mon, Jul 25, 2011 at 7:51 AM, Matthew Flatt  wrote:
> >
> > Here are some timings for 1000 iterations on 2^20-element inputs
> > (32-bit mode, Mac Book Pro 2.53 GHz):
> >
> >  C as above, gcc -02          :  1409
> >  C with indirections, gcc -O2 :  4041
> >  C as above, gcc -O0          :  6425
> >  C with indirections, gcc -O0 :  8480
> >  Racket fxvector (more direct):  8883
> >  Racket                       : 11248
> >
> > I can tweak the JIT in small ways to make a small improvement:
> >
> >  Racket with JIT tweaks       : 10670
> 
> What do the JIT tweaks do to the performance of the fxvector version?

The tweak doesn't change `fxvector-set!'.


_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-25 Thread Sam Tobin-Hochstadt
On Mon, Jul 25, 2011 at 7:51 AM, Matthew Flatt  wrote:
>
> Here are some timings for 1000 iterations on 2^20-element inputs
> (32-bit mode, Mac Book Pro 2.53 GHz):
>
>  C as above, gcc -02          :  1409
>  C with indirections, gcc -O2 :  4041
>  C as above, gcc -O0          :  6425
>  C with indirections, gcc -O0 :  8480
>  Racket fxvector (more direct):  8883
>  Racket                       : 11248
>
> I can tweak the JIT in small ways to make a small improvement:
>
>  Racket with JIT tweaks       : 10670

What do the JIT tweaks do to the performance of the fxvector version?
-- 
sam th
sa...@ccs.neu.edu

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-25 Thread Matthew Flatt
At Sat, 23 Jul 2011 14:42:15 -0400, John Clements wrote:
> This C code adds the content of one buffer to another one, with no checking.  
> The corresponding racket code runs about 10x slower. Do you folks think that 
> it 
> should be possible to do better? (One salient fact: these are 
> shorts--16-bit-ints--not 32-bit ints.)

My sources say no --- at least not a lot better with the current JIT.

> void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
>   int i;
>   for (i = 0; i dst[i+dstOffset] += src[i+srcOffset];
>   }}

One problem is that a `s16-vector' in Racket is a struct that wraps an
FFI wrapper around the array pointer, instead of just an array pointer.
So, a C version of the Racket loop is more like

 void addOn(short_ptr *dst, int dstOffset, 
short_ptr *src, int srcOffset, int len) {
   int i;
   for (i = 0; ia->a[i+dstOffset] += src->a->a[i+srcOffset];
   }
 }

Another problem is that the JIT doesn't use registers well enough, so
its code is closer in this case to unoptimized gcc output. Finally,
there's some extra overhead for the fixnum encoding.

Here are some timings for 1000 iterations on 2^20-element inputs
(32-bit mode, Mac Book Pro 2.53 GHz):

  C as above, gcc -02  :  1409
  C with indirections, gcc -O2 :  4041
  C as above, gcc -O0  :  6425
  C with indirections, gcc -O0 :  8480
  Racket fxvector (more direct):  8883
  Racket   : 11248

I can tweak the JIT in small ways to make a small improvement:

  Racket with JIT tweaks   : 10670


The Racket code I tried is below (ugly to run as fast as I could make
it).



#lang racket/base
(require racket/unsafe/ops
 ffi/vector)

(define SIZE 1048576)

(define addOn #f) ; defeat inlining of `addOn'
(set! addOn
  (lambda (dst dstOffset src srcOffset len)
(let loop ([i 0])
  (unless (eq? i len)
(let ([d (unsafe-fx+ dstOffset i)])
  (unsafe-s16vector-set! 
   dst 
   d
   (unsafe-fx+
(unsafe-s16vector-ref dst d)
(unsafe-s16vector-ref src (unsafe-fx+ srcOffset i)
(loop (unsafe-fx+ i 1))

(let ([a (make-s16vector SIZE)]
  [b (make-s16vector SIZE)])
  (time
   (for ([i (in-range 1000)])
 (addOn a 0 b 0 SIZE

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-24 Thread Noel Welsh
Is the C code compiled to vectorised assembler? That could account for
a factor of about 4-16 depending.

N.

On Sat, Jul 23, 2011 at 7:42 PM, John Clements
 wrote:
> This C code adds the content of one buffer to another one, with no checking.  
> The corresponding racket code runs about 10x slower. Do you folks think that 
> it should be possible to do better? (One salient fact: these are 
> shorts--16-bit-ints--not 32-bit ints.)
>
> John

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-23 Thread John Clements

On Jul 23, 2011, at 2:46 PM, Robby Findler wrote:

> What is the data you're using to represent the shorts in Racket?

#s.


John

> 
> Robby
> 
> On Sat, Jul 23, 2011 at 1:42 PM, John Clements
>  wrote:
>> This C code adds the content of one buffer to another one, with no checking. 
>>  The corresponding racket code runs about 10x slower. Do you folks think 
>> that it should be possible to do better? (One salient fact: these are 
>> shorts--16-bit-ints--not 32-bit ints.)
>> 
>> John
>> 
>> 
>> 
>> void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
>>  int i;
>>  for (i = 0; i>dst[i+dstOffset] += src[i+srcOffset];
>>  }}
>> 
>> 
>> _
>>  For list-related administrative tasks:
>>  http://lists.racket-lang.org/listinfo/dev
>> 



smime.p7s
Description: S/MIME cryptographic signature
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] can we write these four lines of C in performant racket?

2011-07-23 Thread Robby Findler
What is the data you're using to represent the shorts in Racket?

Robby

On Sat, Jul 23, 2011 at 1:42 PM, John Clements
 wrote:
> This C code adds the content of one buffer to another one, with no checking.  
> The corresponding racket code runs about 10x slower. Do you folks think that 
> it should be possible to do better? (One salient fact: these are 
> shorts--16-bit-ints--not 32-bit ints.)
>
> John
>
>
>
> void addOn(short *dst, int dstOffset, short *src, int srcOffset, int len) {
>  int i;
>  for (i = 0; i    dst[i+dstOffset] += src[i+srcOffset];
>  }}
>
>
> _
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/dev
>

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev