Have you repeated the benchmark to see whether
the numbers are repeatable?  If they are,
it could be that the number of bytes involved are 
in the ratio 1:4:8 and the processing is more efficient on longer vectors.  
(But I note that the 
same benchmark on my 2.2 Ghz AMD 3200+ gives the 
same number on all 3, 2.35 cycles per byte.)
Or else the compiler Jsoftware is using does
not have a good implementation of memset() for
the PowerPC (see below).  Or else the AMD
Athlon64 is a very superior architecture.

As I side note, the numbers I got

> For example, on an ordinary (not overclocked)
> 2.2 Ghz AMD 3200+ machine,
> 
>    (1*m) %~ 2.2e9 * 6!:2 'm$0'   [ m=: 1e7
> 2.34726
>    (4*m) %~ 2.2e9 * 6!:2 'm$2'   [ m=: 1e7
> 2.3531
>    (8*m) %~ 2.2e9 * 6!:2 'm$0.2' [ m=: 1e7
> 2.35242

make me think that a speed-up should be possible
for m$0 .  The problem is that J is currently
using the memset() C routine, promised to be
an "efficient way to ... set blocks of memory". 
But I think I can beat the current implementation.



----- Original Message -----
From: Mike Powell <[EMAIL PROTECTED]>
Date: Friday, December 22, 2006 7:07 am
Subject: Re: [Jprogramming] Cycles per Byte

> Roger, help me explain this on a 2 GHz Mac G5 PowerPC:
> 
> 
>    (1*m) %~ 2e9 * 6!:2 'm$0'   [ m=: 1e7
> 6.71018
>    (4*m) %~ 2e9 * 6!:2 'm$2'   [ m=: 1e7
> 6.65046
>    (8*m) %~ 2e9 * 6!:2 'm$0.2'   [ m=: 1e7
> 6.4737


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to