On Tue, 2006-06-27 at 11:38 -0700, James Richard Tyrer wrote:
> One of the reasons that SMP systems don't scale proportionately is that 
> the processors compete for memory access.  It should be noted that this 
> is also the reason that faster clock speeds don't scale proportionately 
> -- memory access becomes the limiting factor.

Intel CPUs has a traditional Front Side Bus (FSB). This means a single
shared bus between chipset, memory and cpu(s). In other words you have
two problems; several components compete for bandwidth and noise.

Recent example: on the Xeon line of cpus they could only do 533Mhz
instead of 800Mhz like the P4 could do because of the reduced signal
quality due to having another cpu on the bus. This of couse reduced
the total FSB bandwidth while doubling the theoretical computation rate.

In other words, the already memory/io starved cpu got even worse when
using two cpus. If using 4-8 cpus this problem gets really critical
since the FSB is still constant at 533Mhz. This is to some extent
patched over by increasing cache sizes, something Intel has done to a
great extent lately.


AMD has solved this problem excellently.
In a single-cpu system the cpu has 1 or 2 integrated memory controllers
plus a separate Hyper Transport (HT) connection to the chipset. This
makes sure that chipset and memory traffic never has to compete for
bandwidth, and also the latency for the cpu to request something from
memory is greatly reduced.

In 4-cpu systems each cpu has 3 HT connections plus 2 memory channels
each. Thus you have 8 memory channels, and this provides a huge max
memory bandwidth.
One of the cpus use one HT connection to connect to
the chipset, the other two are used to connect to cpu 2 and 4.
Cpu 2 uses its HTs connected to cpus: 1,3,4
Cpu 3 uses its HTs connected to cpus: 2,4
Cpu 4 uses its HTs connected to cpus: 1,2,3

As you can see cpus 1 and 3 have no direct connection and has to
relay through cpu 2 or 4 (chooses the one with least load).
This really has no big impact on performance since the HT links
are more than fast enough to handle the traffic.

This means that communication between cpus, from any cpu to any
memory chip, from any cpu to chipset is a lot more complicated
than just using a common FSB. But this is also regarded as a
_very_ good solution to the problem since this scales upwards
to 8 cpus very nicely.

Also, if the OS supports NUMA, it can make sure that a process
runs on a cpu closer to the memory bus than if just distributed
randomly. This is a great thing on big systems since on 8-way
systems a cpu might have to use several relays on its way to
a distant memory chip. Smaller systems have relatively little
advantage of it.


Intel is rumored to be implementing a similar design as AMD in
its upcoming cpus.


I hope this gives you some valuable information, and hopefully
I don't have any bad errors in my interpretation of the designs.

-HK

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to