Here is a program I wrote some time ago to time instructions on the Sam.
Insert your favourite instruction(s) at line 30 (instead of the NOP that's
there) and run it, whereupon it should tell you how many T-states it takes.
(Use any of the main registers, but don't write a 201 (RET) unless you
change line 60 of the program).

 10 CLEAR 59999
 20 DATA 33,0,0,6,10,217
 30 DATA 0
 40 DATA 217,16,60005-(x+1),43,124,181,32,60003-(x+1),201
 50 LET x=60000: RESTORE 20
 60 READ y: POKE x,y: LET x=x+1: IF y <> 201 THEN GO TO 60
 70 OUT 254,128
 80 PAUSE 1: DPOKE 23672,0: CALL 60000: LET t= DPEEK 23672
 90 BORDER 0
100 LET m=t*115850
110 LET i=(m/65536-32)/10-24
120 LET j= INT (i+.5): IF ABS (i-j)>.25 THEN PRINT "Some uncertainty exists"
130 PRINT j;" Tstates"

In modes 3 and 4, when the screen is turned off, the ASIC limits main memory
accesses (that is, accesses within the 512K of built-in RAM but not in the
ROM or the external RAM) to 1 every 4 clock cycles.  This also applies when
the screen is on and the TV scan is not in the middle of drawing something
that is on the screen.  If it is drawing something (this applies for 256
cycles in each scan line of 384 cycles and for 192 of every 312 scan lines)
then the ASIC limits memory accesses to 1 every 8 clock cycles.  I will
describe what happens in the former case.

Since the CPU clearly has to fetch each instruction from memory before
executing it, each instruction must start on a cycle number which is a
multiple of 4. For example, if the instructions INC DE:EXX are executed
then since INC DE takes 6 cycles the EXX must wait a further 2 cycles
before being executed.  [Aside: the memory access actually occurs on the
third cycle of an instruction, so what happens is that the CPU starts
fetching the EXX immediately but has to wait for 2 cycles during the
instruction fetch.]  We usually include the two cycles in the timing for
INC DE and say that INC DE takes 8 cycles, since that is more convenient.
Interestingly enough, INC DE has the distinction of being an instruction
that takes the same length of time whether the screen is being drawn or
not.

Most instructions that are not memory intensive and do not use I/O simply
have their times rounded up to the next multiple of 4 for the above reason.
For example:

  instruction     official time     Sam time
  INC r            4                 4
  INC rr           6                 8
  INC IX          10                12
  ADD HL,rr       11                12
  ADD IX,rr       15                16
  LD r,n           7                 8
  LD rr,nn        10                12
  JR cc,d         if cc 12 else 7   if cc 12 else 8
  JP cc,nn        10                12
  RET cc          if cc 10 else 5   if cc 12 else 8.

Instructions that are memory intensive sometimes take more time.  This
depends on what each instruction does.  The Z80 is usually, though not
always, predictable in the amount of time it takes to do something.  For
example (the left-hand column gives letters by which these actions will
be referred to later on):

ref  action             time
 F   instruction fetch  4    [includes execution time for simple loads & ALU]
 A   memory access      3
 L   8-bit ALU          1
 I   16-bit inc/dec     3    [except PC, and SP during stack operations]
 J   relative jump      5
 X   add d to IX        5.

A memory access and increment operation, which happens during instruction
fetches, double byte memory fetches, block operations and stack operations,
takes only 3 cycles, presumably because an increment circuit is built in to
the memory access path of the Z80.  In the case of the PUSH instruction the
stack pointer has to be decremented before the first memory access; this
takes 1 cycle (referred to as D below.  This also applies to the DEC BC
cycle of an LDIR instruction).

So the following instruction timings result.  Elements of the form w2 in
the right-hand column denote cycles during which the CPU has to wait for a
memory access.

  instruction     official time     Sam time
  PUSH rr         F+D+A+A = 11      F+D+w3+A+w1+A+w1 = 16
  POP rr          F+A+A = 10        F+A+w1+A+w1 = 12
  CALL cc,nn      F+A+A+D+A+A = 17  F+A+w1+A+D+A+w1+A+w1 = 20    [if cc]
  CALL cc,nn      F+A+A = 10        F+A+w1+A+w1 = 12             [if not cc]
  LD HL,(nn)      F+A+A+A+A = 16    F+A+w1+A+w1+A+w1+A+w1 = 20
  DJNZ d          F+L+A+J = 13      F+L+w3+A+J = 16              [if B>0]
  DJNZ d          F+L+A = 8         F+L+w3+A+w1 = 12             [if B=0]
  LDIR            F+F+A+A+1+D+J=21  F+F+A+w1+A+1+D+J+w2 = 24     [if BC>0]
  LDIR            F+F+A+A+1+D = 16  F+F+A+w1+A+1+D+w3 = 20       [if BC=0]

(since LDIR and OTIR take the same amount of time officially, and since an
I/O operation takes one cycle longer than a memory access, the Z80 must
for some reason insert the extra 1 into an LDIR, which is shown above).
Interestingly enough, if DE points to the ROM when an LDIR is carried out
then there are no wait states in the case that BC=0 and the operation takes
16 cycles.

I/O operations are slightly different from memory fetches.  Officially
they take 4 cycles because the Z80's I/O cycle is the same as a memory
cycle but with an added wait state.  However, I/O ports 248-255 inclusive
are contended by the ASIC, which allows only one access every 8 cycles.
For this reason, the time taken by an I/O instruction depends upon where
it is in the program.  For example, OUT (254),A usually takes 12 cycles
(F+A+w1+O, where O is the I/O operation) but if two of them are executed in
sequence then the second one will take 16 cycles (F+A+w5+O).  Assuming that
each instruction starts on an 8-cycle boundary, we have the following.

  instruction     official time     Sam time
  OUT (C),r       F+F+O = 12        F+F+O = 12
  NOP:OUT (C),r   F+F+F+O = 16      F+F+F+w4+O = 20        [if c>247]
  OUT (n),A       F+A+O = 11        F+A+w1+O = 12          [if n>247]
  OUT (n),A       F+A+O = 11        F+A+O+w1 = 12          [if n<248]
  OTIR            F+F+A+O+L+J = 21  F+F+A+w5+O+L+J+w2 = 28 [if c>247 & b>0]
  OTIR            F+F+A+O+L = 16    F+F+A+w5+O+L+w3 = 24   [if c>247 & b=0]
  NOP:OTIR        F+F+F+A+O+L+J=25  F+F+F+A+w1+O+L+J+w2=28 [if c>247 & b>0]
  NOP:OTIR        F+F+F+A+O+L = 20  F+F+F+A+w1+O+L+w3 = 24 [if c>247 & b=0]
  OTIR            F+F+A+O+L+J = 21  F+F+A+O+L+J+w3 = 24    [if c<248  b>0]
  OTIR            F+F+A+O+L = 16    F+F+A+O+L = 16         [if c<248  b=0]

And that's what I know about instruction timings.

imc

Reply via email to