Nick Ing-Simmons wrote:

> I have said this before but the gist of the Nick-theory is:
> 
> Page boundaries are a don't care unless there is a page miss.
> Page misses are so costly that everything else can be ignored,
> but for sane programs they should only be incured at "startup".
> (Reducing code size e.g. no inline only helps here - less pages to load.)

Correct - crossing a page boundary just involves hitting a different TLB
entry.  However, the impact of a TLB miss can be significant in terms of
CPU cycles: between 40 or 100 or so to reload seems to be common.  Most
correctly sized systems will only page fault executable pages once,
until they are all loaded - on my system a bare-bones perl process needs
2560k of memory, and of that only 672k is libperl.so, and of that 640k
is paged in immediately.  All these numbers are tiny when compared to
the amount of memory in the machine.  If you are taking executable page
faults whilst the process is running you are really short of memory. 

> It is cache that matters.

And the TLB :-)  Some ballpark cache and memory figures for Sparc, Xeon
and PA-RISC highest and lowest across all processors):

L1 cache latency:       2 - 3 cycles
L1 cache miss penalty:  6 - 112 cycles
L2 cache latency:       6 - 13 cycles
L2 cache miss penalty:  60 - 100 cycles
Main memory latency:    80 - 112 cycles 

> Modern processors (can) execute several instructions per-cycle.
> In contrast a cache miss to 100MHz SDRAM costs a 500MHz processor
> more than 5-cycles (say up to 10 instructions for 2-way super-scalar)
> per word missed.

It is worse than you think (see above).

> I used to think that this was a "RISC Processor only" argument.
> But is seems (no hard numbers yet) that Pentium at least follows
> same pattern.

It does.  However my understanding Pentium stops the pipeline if it has
to do a TLB walk, which it then does in hardware.  This mitigates the
effect of the TLB miss somewhat.

Alan Burlison

Reply via email to