On Tue, 22 May 2007, Connie Sieh wrote:

On Tue, 22 May 2007, rochelle lauer wrote:

This is a multi-part message in MIME format.

--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7BIT



--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
Content-type: text/plain; name=timing.txt
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=timing.txt

Hello,

I am trying to understand some weird performance characteristics
on a newly purchased blade (see statistics below).

The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors.
This is the first AMD and first 64 processor we have bought.

Is this system a numa based system?

Numa can be involved in performance issues. If you have numa and it is on try turning it off and rerun your tests.

-Connie Sieh


I installed SL44 x86_64 and we did some performance tests.

When running a single job (compute bound monte-carlo with HBOOK output)
the performance was about twice as slow as running on our
Intel based blade. Although this difference
could be attributed to difference in
proccesors, running several single

How does the amount of memory compare to the intel based tests?

jobs in a row produced rather erratic results...
200-300 seconds different on a 900 second job.
Some were comparable to the 32 bit processor, some were not.

Also, running 4 of the same jobs in parallel
produced results which were almost twice as fast !

I then (for fun) installed SL43 x86_64 .  This produced results
quite different than those on SL44 and more compatible with
our 32 bit blades.

Below  is a sample of the CPU statistics

We first ran the existing 32 bit executable.

We then recompiled and ran the 64 bit executable.

Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have
missing libraries) so we would really like to understand this performance
discrepency on 32 bit executables and SL44.

32 bit executable single job

    SL 44                          SL43
      906 sec                       556 sec

32 bit executable 4 jobs in parallel

  SL44                          SL43

job 1   452 sec                 446 sec
job 2   446 sec                 442 sec
job 3   445 sec                 444 sec
job 4   448 sec                 446 sec


64 bit executable single job

   510 sec                    497  sec
   The 64 bit executable seems to be a little more predictable


So, does anyone  have any idea

 1. Why such a difference in performance between SL44 and SL43 (Why does
    SL44 produce much slower results on a single job)

Not enough info to determine this.
The biggest difference between SL43 and SL44 is that the kernel has
changes.


 2. Why running 4 jobs in parallel produces faster results than
     a single job ? One would think jobs running in parallel
     would produce slightly slower performance.

Depends on what they are doing?


 3. Why running 4 jobs in parallel on SL44 produces much
    faster results (900 sec vs 452 sec) .


I suggest you try some of the performance tools to help determine what is
going on.

Things like oprofile, vmstat can help determine what is going on.

 4. Should we not be running our 32 bit executables with an
    SLxx  x86_64  installed ?
    I have not yet tried installing SL44(43) x86 to check the
    performance. Should I ?


Most see a performance improvement with 32bit on 64bit os.  This has been
seen quite a bit with AMD 64bit Opteron cpu's because the memory bandwith
is faster on AMD 64bit Opteron cpu's.

faster on >


Thanks for any insight or help

Regards
Rochelle Lauer
Yale University Physics


--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)--

-Connie Sieh

Reply via email to