Chris Pile wrote:
> In short, then, I guess I should ignore what Zaks is telling me 
> regarding T-states and simply try to use single/double byte 
> instructions wherever and whenever possible?

As a very rough guide, round the official instruction timings up to the next 
multiple of 4, then double it if executing over the main screen.

For a slightly better approximation, start with the official instruction 
timing.  Then add 1T for every memory access (or 5T if executing over the main 
screen), and subtract 1T from the total.  Don't forget to include opcode, 
operand and data fetches during execution.

Alternatively, just single-step real code in SimCoupe to see how long each 
instruction takes!


For anything more accurate you need to look at the instruction timing breakdown 
of what is happening during each machine cycle.  The true total timing is just 
the sum of the parts, with contention delays added before each RAM (or 
contended I/O) access.  Looking at the JP nn case from 
http://www.z80.info/z80ins.txt:

JP nn = OCF(4)  ODL(3)  ODH (3)

 4T for opcode fetch
 3T to read operand data low
 3T to read operand data high

That gives 10T for the official timing.  For the SAM comparison it's simplest 
if we examine from the same first step up to the same point in the following 
instruction, rather than starting with the contended opcode fetch:

 4T for opcode fetch
 ?T delay for contended operand RAM access
 3T to read the low byte of the operand data
 ?T delay for contended operand RAM access
 3T to read the high byte of the operand data
 ?T delay for contended opcode RAM access before next opcode

With JP nn there are three points where the contention delays are applied.  The 
delay values depend on what the ASIC is doing at the time.  It restricts us to 
1 access every 4T in the border and 1 access in every 8T over the main screen.  
This means rounding up the cycle position to multiples of 4T or 8T at each 
contention point.

In the best case for border area execution this gives:

 4T for opcode fetch
 0T delay (rounding to multiple of 4, nothing needed in this case)
 3T to read the low byte of the operand data
 1T delay (rounding to multiple of 4)
 3T to read the high byte of the operand data
 1T delay (rounding to multiple of 4)
=12T

In the worst case for main screen execution this gives:

 4T for opcode fetch
 4T delay (rounding to multiple of 8)
 3T to read the low byte of the operand data
 5T delay (rounding to multiple of 8)
 3T to read the high byte of the operand data
 5T delay (rounding to multiple of 8)
=24T

Each delay depends on where it's executed, which prevents there being a 
definitive list of SAM instruction timings.  Long instructions accessing memory 
have the biggest contention penalty.  LD IX,(nn) has 6 contention points, which 
in the worst case increases the official 20T timing to 48T!

In addition to RAM contention there is also I/O contention to consider.  SAM 
ports &F8-FF are of interest to the ASIC, so there is 8T rounding applied 
before each IN/OUT.  As with the RAM accesses, this is applied at the point in 
the instruction where the I/O occurs.

It's not as simple as using the official Z80 timings, but you'll soon get a 
feel for it.  For best results, use short instructions and reduce memory 
accesses by keeping working values in registers.  In the border area, many 
common instructions are close to official timings – 4T single-byte instructions 
have no additional penalty, and the common 7T instructions only cost an 
additional 1T.


Quick overview:

ROM accesses are uncontended, so there are no additional delays.  Though any 
RAM accesses made from ROM code will still be affected, of course.

Internal RAM accesses have 4T rounding in the border area, and 8T when the ASIC 
is fetching display data to draw the main screen.  If the display is disabled 
there's no screen drawing, so 4T rounding applies everywhere.  In screen mode 1 
there are additional bands of 8T rounding across the entire display, 
alternating every 64T (see http://simonowen.com/sam/articles/mode1/ for 
details).

External RAM accesses have 4T rounding at all times.

I/O on ports &F8 to &FF have 8T rounding.


Clear as mud?

Si

Reply via email to