This and the preceding message were originally posted on the J forum 
a bit over a year ago. This Russian guy made the most insightful 
comments.

- joey

At 02:21  -0800 2007/12/10, Viktor Cerovski wrote:


>
>      ts =: 6!:2 , 7!:2...@]
>
>  On Linux box -
>
>      10 ts '%. 500 500 ?...@$ 1000'
>  1.07966 1.57304e7
>
>  On Mac -
>
>      10 ts '%. 500 500 ?...@$ 1000'
>  0.492236 1.57304e7
>
>  Which surprises me since one would guess the 3.00 GHz machine
>  would be faster than the 2.4 GHz machine - instead it is half
>  the speed...
>
I think that processor frequency is not so important nowadays.

Here is the same test done on AMD Turion mobile, it's
64 bit but it is running in 32 bit mode under Windows XP:

     10 ts '%. 500 500 ?...@$ 1000'
0.7346335079 15730368

This Turion is only 1.79GHz and with only 512K cache, yet it is
faster than Pentium on 3GHz with 2MB cache.

One interpretation:  the matrix inversion works in double precision
floating point arithmetics, and each numbers takes 8 bytes, so we
have approx. 2MB memory chunk to process.

Cache size plays an important role, but in order for data to arrive
into cache, they have to be transfered from RAM.

Then my guess for the difference between Turion and Pentium
(since they are both in 32 bits mode) would be that the data
transfer to processor is faster in AMD.  In fact, AMD builds-in
a separate memory controler on its boards which speeds up
data transfer into the processor (btw, it seems that AMD was
ahead of Intel prior to Core 2 Duo in FP intensive calculations,
I mean Opteron vs Xeon and maybe also Athlon vs Pentium).

This AMD has 333MHz DDR built in, and it would be interesting
to know what kind of memory is in your machines.  I would venture
to guess that your Pentium might be equipped with 166MHz SDRAM,
and that the 2MB cache is actually split into 1MB per core.

In number crunching applications, especially with large amounts
of data, I would put hardware parts roughly in the following
order according to their importance:

1. RAM subsystem (speed and memory controler)

2. cache size

3. CPU frequency

This order can change depending on the size and type of the problem,
so it is very tentative.  If we can fit our whole program (like the
kernel of K) and perhaps even data into the cache, then we can
squeeze out more of the raw frequency/cache size alone.

Faster processing of FP operations per clock is naturally always
a big bonus, and 64 bits do get larger transfers and faster FP,
but of course the 64/32=2 ratio alone does not automatically
translate into 2x faster code.

Also, according to some very unsophisticated tests I've done,
several Linux distributions (Gentoo, Debian and Fedora) achieve
about 10%-15% higher memory transfer rates than Windows XP
(both 32 and 64 bit).

-- 
View this message in context: 
http://www.nabble.com/Computer-Language-Benchmarks-Game-tp14087924s24193p14250266.html
Sent from the J Programming mailing list archive at Nabble.com.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to