>How fast can a processor run if it read the memory at 100MHz
>but it's running at 300MHz and spend 90% of is time copying
>memory from one place to another ??? (It seems I've tryed out of
>order execution with my English!!!)
this is a major problem in CPU design these days.. it's called 'processor
stalling'.
the increases in CPU speed over the last few years haven't been because of
faster electronics or radically better logic designs (some.. but not much),
the real speed increases have come from putting more than one CPU on the
same chip, and running them in paralell.
the official geekspeke for it is either 'superscalar microprocessing' or
more generally 'symmetric multiprocessing'. in human terms, it means 5
people trying to share a single bathroom in the morning while getting ready
for work.
assuming you have enough floor space to accomodate everyone, you don't
really need five complete and separate bathrooms.. people can take turns
with the shower, sink, and other assorted porcelain. the best solution is
somewhere between a single-person bathroom and a small locker room, though.
double up on a few fixtures here and there to reduce waiting, but expect
the occasional *brief* traffic jam.
in CPU terms, the sharing issues involve the separate processors and
various parts of memory. obviously, all 5 CPUs can't write to the same
address at the same time, but making them take turns kills your processing
speed. fortunately, the odds of two processors needing the same address
at the same time are fairly low, so you can afford to take a small hit in
efficiency on the rare occasions when an address collision does occur.
the bad news is that the data bus, which carries information between the
CPU and RAM, is much, much slower than even a single modern CPU. a single
modern processor can crunch numbers about 20 times as fast as the bus can
deliver them from memory, and a superscalar processor, with 3-5 CPUs on the
same chip, is simply ridiculous overkill.
the way around the problem is to build small buffers of RAM directly into
the CPU itself. that's what caching is all about. a cache is a small
block of memory (maybe 8-64K) where the processor can store numbers it's
using *right now*. with proper optimizations (which i won't even try to
discuss.. very boring), you can load data from primary RAM into the cache
in such a way that 95%+ of the data the processor needs can be found within
the cache.
the 95% hit rate slows down the demand on the data bus to about 1/20th of
what it was.. which is convenient, since the CPU is 20 times as fast as the
bus. the whole issue of 'prefetching' memory deals with organizing data
in RAM so you can load a whole bunch of stuff into the cache with one slow,
tedious memory call, and have most of it be information the CPU will want
to use between now and the time the next batch of data is due to arrive.
this involves nasty-deep geeking, and coordination between the physical
electronics and the compilers which generate software for that platform.
frankly, it's easier to write it off to elves and faeries.
caching solves the speed problem for a single CPU. but a superscalar
processor has multiple CPUs, all running at the same time. we've still
got a logjam, but this one is only, say, 5 times as much demand as the data
bus can handle.. not 100 times.
caching is a good trick, though. not only can you use it once, you can
use it again. what chip manufacturers do is build 5 separate CPUs, all
with their own private caches, onto the same chunk of silicon. but
instead of connecting those private caches directly to the data bus, they
throw in another, general cache between the CPU caches and RAM. that's
your Level-2 Backside Cache, which everyone is making so much noise about
these days.
the L2 cache is on the same chip as the processors, so it's still a lot
faster than the data bus. it's still slower than the private caches, but
that's okay, because the private caches are catching most of the load from
the individual CPUs. the L2 cache smooths out about 80-90% of the load
from the primary caches, dropping the load on the data bus back to a level
it can actually handle.
to make that work, though, the bus between the L2 cache and the primary
caches has to be awfully fast. to do that, chip designers make the bus
wide enough that the L2 cache can talk to multiple CPUs at the same time..
basically the same issue as deciding how many sinks to put into our group
bathroom, earlier.
due to the magic of optimized data storage, anything one of the CPUs on a
superscalar microprocessor needs will probably also be needed by one of its
neighbors. that makes a single read-cycle by the L2 cache valuable to all
the CPUs at once. there will still be times when two processors need the
same address at the same time, but by this time, those cases are so rare
that you only lose a few microseconds of processing efficiency per minute.
L2 cache issues are what ultimately killed the x86 line of processor
development. the bus between the primary caches and the L2 cache isn't
wide enough to keep the processors from stalling, and making the L2 cache
bigger doesn't help enough. G3 PowerPCs get better performance even at
lower clock speeds, because they have a new and very efficient connection
between the L2 cache and the primaries.
so.. how fast is a 300MHz processor if it spends 90% of its time stalled on
a 100MHz data bus? about as fast as a single, good 386.
mike stone <[EMAIL PROTECTED]> 'net geek..
been there, done that, have network, will travel.
____________________________________________________________________
--------------------------------------------------------------------
Join The Web Consultants Association : Register on our web site Now
Web Consultants Web Site : http://just4u.com/webconsultants
If you lose the instructions All subscription/unsubscribing can be done
directly from our website for all our lists.
---------------------------------------------------------------------