In message from Mark Hahn [EMAIL PROTECTED] (Fri, 12 Oct 2007
16:09:05 -0400 (EDT)):
This means that 2 additional FP results per cycle in
microarchitecture gives
only about 7% of performance increase :-(
the 4 flops/cycle is really for linpack-like code: it assumes you are
executing packed
In message from [EMAIL PROTECTED] (Fri, 12 Oct 2007 20:50:08
+):
Mikhail,
I am not sure I fully understand what you are presenting here, but I
might say that yes, at the FPU unit level the series AMD
Opteron/Barcelona and the Intel Core2/Clovertown (and also Harpertown
at 45 nm) are
I found 1st AMD quad core (Opteron 2347/1.9 Ghz) SPECfp2006 results
(at www.spec.org) obtained by IBM: 11.2/10.7 for peak/base values.
I'll say about 1 core only, i.e. for results w/Autoparallel=NO.
Let me look to other x86-64 microarchitecture w/same 4*64 bit FP
results per cycle, i.e. Intel
This means that 2 additional FP results per cycle in microarchitecture gives
only about 7% of performance increase :-(
the 4 flops/cycle is really for linpack-like code: it assumes you are
executing packed double SIMD.
The question is - should we wait some better results for new incoming
-- Original message --
From: Mikhail Kuzminsky [EMAIL PROTECTED]
But if I'll compare SPECfp2006 results w/x86-64 microarchitecture
w/2*64 bit FP results per cycle - previous Opteron generation - I'll
see some strange (IMHO) result. So, for Opteron SE/3 Ghz,