4-way SMP Opteron system is actually pretty damn cheap -- if you get 2xDual Core versus 4xSingle. I just ordered a 2x265 (4x1.8ghz) system and the price was about $1300 more than a 2x244 (2x1.8ghz).

Now you might ask, is a 2xDC comparable to 4x1? Here's some benchmarks I've found that showing DC versus Single @ the same clock rates/same # cores.

SpecIntRate Windows:
4x846 = 56.7
2x270 = 62.6

SpecFPRate Windows:
4x846 = 52.5
2x270 = 55.3

SpecWeb99SSL:
4x846 = 3399
2x270 = 4100 (2 870s were used)

Specjbb2000 IBM JVM:
4x848 = 146385
4x275 = 157432

What it looks like is a DC system is about 1 clock blip faster than a corresponding single core SMP system. E.g. if you have a 2xDC @ 1.8ghz, you need a 4x1 @ 2ghz to match the speed. (In some benchmarks, the difference is 2 clock steps up.)

On the surface, it looks pretty amazing that a 4x1 Opteron with twice the memory bandwidth is slower than a corresponding 2xDC. (DC Opterons use the same socket as plain jane Opterons so they use the same 2xDDR memory setup.) It turns out the latency in a 2xDC setup is just so much lower and most apps like lower latency than higher bandwidth. Look at the diagram of the following Tyan 4-processor MB:

ftp://ftp.tyan.com/datasheets/d_s4882_100.pdf

Take particular note of the lack of diagonal lines connecting CPUs. What this means is if a process running on CPU0 needs memory attached to CPU3, it must request either CPU1 or CPU2 to forward the request for it. Without NUMA support, we're looking at 25% of memory access runs @ 50ns, 50% 110ns, 25% 170ns. (Rough numbers, I'd have to do a lot of googling to the find the exact latencies but I'm just too lazy now.)

Now consider a 2xDC system. The 2 cores inside a single package are connected by an immensely fast internal SRQ connection. As long as there's no bandwidth limitation, both cores have fullspeed access to memory while core-to-core snooping on each respective cache is roughly 10ns. So memory access speeds look like so: 50% 50ns, 50% 110ns.

If the memory locations you are need to access happen to be contained in the L1/L2 cache, this makes the difference even more pronounced. You then get memory access patterns for 4x1: 25% 5ns, 50% 65ns, 25% 125ns versus 2xDC: 25% 5ns, 25% 15ns, 50% 65ns.



Joel Fradkin wrote:
Thank you much for the info.
I will take a look. I think the prices I have been seeing may exclude us
getting another 4 proc box this soon. My boss asked me to get something in
the 15K range (I spent 30 on the Dell). The HP seemed to run around 30 but it had a lot more drives then the dell
(speced it with 14 10k drives).

---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to