On Tue, 2006-06-27 at 11:38 -0700, James Richard Tyrer wrote: > One of the reasons that SMP systems don't scale proportionately is that > the processors compete for memory access. It should be noted that this > is also the reason that faster clock speeds don't scale proportionately > -- memory access becomes the limiting factor.
Intel CPUs has a traditional Front Side Bus (FSB). This means a single shared bus between chipset, memory and cpu(s). In other words you have two problems; several components compete for bandwidth and noise. Recent example: on the Xeon line of cpus they could only do 533Mhz instead of 800Mhz like the P4 could do because of the reduced signal quality due to having another cpu on the bus. This of couse reduced the total FSB bandwidth while doubling the theoretical computation rate. In other words, the already memory/io starved cpu got even worse when using two cpus. If using 4-8 cpus this problem gets really critical since the FSB is still constant at 533Mhz. This is to some extent patched over by increasing cache sizes, something Intel has done to a great extent lately. AMD has solved this problem excellently. In a single-cpu system the cpu has 1 or 2 integrated memory controllers plus a separate Hyper Transport (HT) connection to the chipset. This makes sure that chipset and memory traffic never has to compete for bandwidth, and also the latency for the cpu to request something from memory is greatly reduced. In 4-cpu systems each cpu has 3 HT connections plus 2 memory channels each. Thus you have 8 memory channels, and this provides a huge max memory bandwidth. One of the cpus use one HT connection to connect to the chipset, the other two are used to connect to cpu 2 and 4. Cpu 2 uses its HTs connected to cpus: 1,3,4 Cpu 3 uses its HTs connected to cpus: 2,4 Cpu 4 uses its HTs connected to cpus: 1,2,3 As you can see cpus 1 and 3 have no direct connection and has to relay through cpu 2 or 4 (chooses the one with least load). This really has no big impact on performance since the HT links are more than fast enough to handle the traffic. This means that communication between cpus, from any cpu to any memory chip, from any cpu to chipset is a lot more complicated than just using a common FSB. But this is also regarded as a _very_ good solution to the problem since this scales upwards to 8 cpus very nicely. Also, if the OS supports NUMA, it can make sure that a process runs on a cpu closer to the memory bus than if just distributed randomly. This is a great thing on big systems since on 8-way systems a cpu might have to use several relays on its way to a distant memory chip. Smaller systems have relatively little advantage of it. Intel is rumored to be implementing a similar design as AMD in its upcoming cpus. I hope this gives you some valuable information, and hopefully I don't have any bad errors in my interpretation of the designs. -HK _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
