I have been skeptical of computer benchmarks for a long
time (40 or more years). If one has time to waste, they
can be amusing - but as an evaluation tool, they are, at
best, rough directional guides. I agree that the ones
proscribing alternative approaches to solving a problem
are biased against j and other "think out of the box"
languages or even creative approaches to solution. But
I have a different question to ask the forum -

NB. This message turned into a, perhaps too, long trip down
    memory lane, but there really is a question at the end
    about which I would like to hear some discussion - PLEASE
    don't quote this whole/long message in replies!

Starting all those 40 years ago, I was impressed at APL's
ability to invert a matrix. The mainframe APLSV system
I had at my disposal (running on a System/360 Model 50) could
invert a 20 20 matrix of random numbers in just a few seconds.
The other APL machine I had access to was a model 75 and it
was quite a bit faster. Soon I decided to increase the
size of my test matrix to 50x50 and the time to invert
that on a S/370-145 was around 11 seconds.

Over the years, I had several complaints about using such
a floating point intensive check to look at system/cpu
speed. But still, people tried it on various machines
and reported the interesting results. In June of 1985,
Roger Moore reported that the 50x50 inverse took 12.5
seconds running on an AT/370 card (this was an ISA card
in a PC/AT which emulated a S/370 and was running Sharp
APL) - Nice that a PC could perform as well as a mainframe
model 145. Machines got better, and later in 1985 I noted
that the Amdahl/V8 we were using took only 0.2 seconds
to invert the 50 50 matrix.

In November of 1985, at COMDEX, I got a chance to try
my test in a Unix APL from STSC running on a Motorola
68010 clocked at 8 MHz (my how speeds have changed!) and
it took about 100 seconds to invert the 50x50... In 1987
I had a chance to use a brand new Hitachi mainframe with
Sharp APL and it was able to invert a 100x100 in under 1
second, so at that time, mainframes still ruled - but the
exponential nature of Moore's Law gave them a relatively
short time to keep that dominance...

Then came j -- here is an email from Eugene McDonnell (in
his usual amazingly thorough style) to Roger reporting
on results from several different machines (this message
was an expansion on one he had sent 4 days earlier):

 no. 4353297 filed 17.34.00  mon 16 jul 1990
 from eem
 to   hui
 cc   dhs jkt kei
 subj JKT benchmark on J on various machines

 The results of executing:

    6X.2'%.?50 50$1000'

 using J version 1 on various machines are as follows:

 ---------  Machine  --------------------------------------  ----Result---
 IBM PC, 8088, 4.77 MHz, MS/DOS, no math chip                    2801.21   v
 IBM XT, 8088, 4.77 MHz, MS/DOS                                  1680      v
 Apple Macintosh Plus, 68000, 7.8336 MHz, MacOS, no math chip    1207.08  *v
 Atari ST, 68000, Mac Simulator, no math chip                    935.1
 Packard-Bell PC/AT, MS/DOS, no math chip                        525.495  *
 QSP Super Micro 286AT, 8 MHz MS/DOS, no math chip               521.099  *v
 AT&T 3B1, 68010, 10MHz, UNIX, no math chip                      442.332  *v
 IBM PS/2 55, 386 SX, 20 MHz, MS/DOS, no math chip               341.044
 IBM PS/2 70, 386, 25 MHz, MS/DOS                                230.879   v
 Apple Macintosh IIx, 68030, 16.67 MHz, MacOS                    227.25    v
 Apple Macintosh SE/30, 68030, 16.67 MHz, MacOS                  201.55    v
 Sun 3/60, 68020, 16.67 MHz, UNIX                                162.027  *v
 Philips P9070, 68020, 16.67 MHz, UNIX                           158
 Apple Macintosh IIci, 68030, 25 MHz, MacOS                      152.35
 Sun 386i/250, 25 MHz, DOS window PC/AT emulator                 111.758  *
 Apple Macintosh IIfx, 68030, 40 MHz, MacOS                       76.83
 Sun 386i/250, 25 MHz, UNIX                                       73.947  *
 Sun Sparcstation 1+ (Sun 4/65), UNIX                             28.0322 *
 IBM RS/6000/320, 20 MHz, AIX                                     20.36
 Mips R3240, Mips R3000, 25 MHz, UNIX                             12.05    v

 Entries followed by a * are new or have added information.
 Timing accuracy in entries followed by a v has been verified by stopwatch.

A few days later, Roger responded with:

 no. 4381718 filed 15.50.43  thu 26 jul 1990
 from hui
 to   eem
 cc   dhs jkt kei
 subj Further JKT benchmarks

 6X.2 '%.?50 50$1000' on a MIPS machine at Waterloo gave 12 point
 something, approximately equal to the 12.05 you reported.  On a NeXT
 machine, the figure was 101.141.  On a VAX, J is not yet stable enough
 to execute the benchmark.

 Contrary to what I said before, 6X.2 gives CPU time and not elapsed
 time.

Through all of this, there were valid criticisms about my
favorite benchmark - e.g.

 no. 2772762 filed 22.47.36  tue 10 may 1988
 from rdm
 to   jkt
 cc   rbe
 subj performance

 yes but what about the single most important apl expression ever coded:

      <Qdivide>50 50<rho>1e9


 no. 3492475 filed 20.09.40  mon 12 jun 1989
 from rbe
 to   akr
 cc   jkt
 subj fyi

 Note that jkt's favorite benchmark is strongly floating-point biased,
 and hence doesn't reflect most sapl site instruction mixes... Bob


In talking about this with Roger Hui in the early days of j, I suggested
that maybe using my favorite benchmark was inappropriate to look at j.
He countered by saying that because of the way he had implemented matrix inverse, using j primitives to code it rather than hand tuned assembler
code as used in mainframe systems, he reckoned it was a pretty fair test.
So I have continued to try it to evaluate systems over the 17 years since
then. I don't know if it is still true that %. is a "fair benchmark",
Roger would have to comment on that....

Over the years (and I've spared you the messages from my 30 years
of saved emails...) I was often confused by things I found using
my little test. Now, once again I'm confused and I asked Roger
if he could clear my confusion and he said he had no idea why I
might have observed my puzzling results.

So, finally, here is my current conundrum.

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

I just got my hands on a new 24 inch iMac (what an incredibly
lovely and powerful computer!!). Of course, the first thing I
did was install j602c on it (the new installation process is
light years of improvement over past Mac installations, thank
you again Eric for all the work!) The next thing I did was run
my favorite benchmark (increasing the size of the matrix a bit).
Here is the the question I posed to Roger and now ask the
collective thoughts of the forum -

I started out timing  %. 100 100 [EMAIL PROTECTED] 1000   but that is such
a short time that I changed to 500 500.... The two machines
giving rise to my confusion are a Linux box with dual core
Intel cpu - and an an OS 10.4 Mac with dual core Intel cpu.
On the Linux box, an excerpt from cat /proc/cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.720
cache size      : 2048 KB

memory size on Linux (Redhat Fedora Core6) is 1.5 Gbytes.

On the Mac -

  Processor Name:       Intel Core 2 Duo
  Processor Speed:      2.4 GHz
  Number Of Processors: 1
  Total Number Of Cores:        2
  L2 Cache (per processor):     4 MB
  Memory:       1 GB
  Bus Speed:    800 MHz

Both machines running j602c

   ts =: 6!:2 , 7!:[EMAIL PROTECTED]

On Linux box -
10 ts '%. 500 500 [EMAIL PROTECTED] 1000'
1.07966 1.57304e7

On Mac -
10 ts '%. 500 500 [EMAIL PROTECTED] 1000'
0.492236 1.57304e7

Which surprises me since one would guess the 3.00 GHz machine
would be faster than the 2.4 GHz machine - instead it is half
the speed... I thought it might be the cache size, but a smaller
matrix produces:

   10 ts '%. 100 100 [EMAIL PROTECTED] 1000'
0.0135964 984768

and

   10 ts '%. 100 100 [EMAIL PROTECTED] 1000'
0.0088636 984768

respectively, even though the cache would seem not to be an issue.

I had noticed this 2:1 ratio, and so when I was looking at spirals
yesterday, I found not such a difference, that is -

   JKT =: ,~ $ /:@(+/\)@Increments
     Increments =: _1&|. @ (# Cycles) @ Repeats
        Repeats =: ,~ 2: # i.
        Cycles =: # $ (,-)@(,1:)@{:

   JKT 5
12 11 10  9 24
13  2  1  8 23
14  3  0  7 22
15  4  5  6 21
16 17 18 19 20

On the 3 GHz Linux box -

   ts 'JKT 1001'
0.080173 1.6778e7

On 2.4 GHz Mac -

   ts 'JKT 1001'
0.0818748 1.6778e7

So this seems relatively close, I suppose it could be that
the speed difference is in processing floats, but that
surprises me too. Have you experienced similar things?
Any ideas what is going on?

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

After posing the above, I watched CPU monitors to verify
that j only uses one core at a time, so it isn't that OS X
somehow automatically uses both. But still, I can't imagine
why a 2.4 GHz machine is twice as fast as a 3 GHz one...
To further my confusion, when I installed OS 10.5 (Leopard)
the timings became even a little more favorable - this was
a surprise as well. The change wasn't large, but 10.5 is
consistently a few percent faster than 10.4...

Comments/thoughts?

- joey



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to