Have you repeated the benchmark to see whether the numbers are repeatable? If they are, it could be that the number of bytes involved are in the ratio 1:4:8 and the processing is more efficient on longer vectors. (But I note that the same benchmark on my 2.2 Ghz AMD 3200+ gives the same number on all 3, 2.35 cycles per byte.) Or else the compiler Jsoftware is using does not have a good implementation of memset() for the PowerPC (see below). Or else the AMD Athlon64 is a very superior architecture.
As I side note, the numbers I got > For example, on an ordinary (not overclocked) > 2.2 Ghz AMD 3200+ machine, > > (1*m) %~ 2.2e9 * 6!:2 'm$0' [ m=: 1e7 > 2.34726 > (4*m) %~ 2.2e9 * 6!:2 'm$2' [ m=: 1e7 > 2.3531 > (8*m) %~ 2.2e9 * 6!:2 'm$0.2' [ m=: 1e7 > 2.35242 make me think that a speed-up should be possible for m$0 . The problem is that J is currently using the memset() C routine, promised to be an "efficient way to ... set blocks of memory". But I think I can beat the current implementation. ----- Original Message ----- From: Mike Powell <[EMAIL PROTECTED]> Date: Friday, December 22, 2006 7:07 am Subject: Re: [Jprogramming] Cycles per Byte > Roger, help me explain this on a 2 GHz Mac G5 PowerPC: > > > (1*m) %~ 2e9 * 6!:2 'm$0' [ m=: 1e7 > 6.71018 > (4*m) %~ 2e9 * 6!:2 'm$2' [ m=: 1e7 > 6.65046 > (8*m) %~ 2e9 * 6!:2 'm$0.2' [ m=: 1e7 > 6.4737 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
