On Tue, 22 May 2007, rochelle lauer wrote:
This is a multi-part message in MIME format.
--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7BIT
--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
Content-type: text/plain; name=timing.txt
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=timing.txt
Hello,
I am trying to understand some weird performance characteristics
on a newly purchased blade (see statistics below).
The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors.
This is the first AMD and first 64 processor we have bought.
Is this system a numa based system?
I installed SL44 x86_64 and we did some performance tests.
When running a single job (compute bound monte-carlo with HBOOK output)
the performance was about twice as slow as running on our
Intel based blade. Although this difference
could be attributed to difference in
proccesors, running several single
How does the amount of memory compare to the intel based tests?
jobs in a row produced rather erratic results...
200-300 seconds different on a 900 second job.
Some were comparable to the 32 bit processor, some were not.
Also, running 4 of the same jobs in parallel
produced results which were almost twice as fast !
I then (for fun) installed SL43 x86_64 . This produced results
quite different than those on SL44 and more compatible with
our 32 bit blades.
Below is a sample of the CPU statistics
We first ran the existing 32 bit executable.
We then recompiled and ran the 64 bit executable.
Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have
missing libraries) so we would really like to understand this performance
discrepency on 32 bit executables and SL44.
32 bit executable single job
SL 44 SL43
906 sec 556 sec
32 bit executable 4 jobs in parallel
SL44 SL43
job 1 452 sec 446 sec
job 2 446 sec 442 sec
job 3 445 sec 444 sec
job 4 448 sec 446 sec
64 bit executable single job
510 sec 497 sec
The 64 bit executable seems to be a little more predictable
So, does anyone have any idea
1. Why such a difference in performance between SL44 and SL43 (Why does
SL44 produce much slower results on a single job)
Not enough info to determine this.
The biggest difference between SL43 and SL44 is that the kernel has
changes.
2. Why running 4 jobs in parallel produces faster results than
a single job ? One would think jobs running in parallel
would produce slightly slower performance.
Depends on what they are doing?
3. Why running 4 jobs in parallel on SL44 produces much
faster results (900 sec vs 452 sec) .
I suggest you try some of the performance tools to help determine what is
going on.
Things like oprofile, vmstat can help determine what is going on.
4. Should we not be running our 32 bit executables with an
SLxx x86_64 installed ?
I have not yet tried installing SL44(43) x86 to check the
performance. Should I ?
Most see a performance improvement with 32bit on 64bit os. This has been
seen quite a bit with AMD 64bit Opteron cpu's because the memory bandwith
is faster on AMD 64bit Opteron cpu's.
faster on >
Thanks for any insight or help
Regards
Rochelle Lauer
Yale University Physics
--Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)--
-Connie Sieh