Here is a program I wrote some time ago to time instructions on the Sam. Insert your favourite instruction(s) at line 30 (instead of the NOP that's there) and run it, whereupon it should tell you how many T-states it takes. (Use any of the main registers, but don't write a 201 (RET) unless you change line 60 of the program).
10 CLEAR 59999 20 DATA 33,0,0,6,10,217 30 DATA 0 40 DATA 217,16,60005-(x+1),43,124,181,32,60003-(x+1),201 50 LET x=60000: RESTORE 20 60 READ y: POKE x,y: LET x=x+1: IF y <> 201 THEN GO TO 60 70 OUT 254,128 80 PAUSE 1: DPOKE 23672,0: CALL 60000: LET t= DPEEK 23672 90 BORDER 0 100 LET m=t*115850 110 LET i=(m/65536-32)/10-24 120 LET j= INT (i+.5): IF ABS (i-j)>.25 THEN PRINT "Some uncertainty exists" 130 PRINT j;" Tstates" In modes 3 and 4, when the screen is turned off, the ASIC limits main memory accesses (that is, accesses within the 512K of built-in RAM but not in the ROM or the external RAM) to 1 every 4 clock cycles. This also applies when the screen is on and the TV scan is not in the middle of drawing something that is on the screen. If it is drawing something (this applies for 256 cycles in each scan line of 384 cycles and for 192 of every 312 scan lines) then the ASIC limits memory accesses to 1 every 8 clock cycles. I will describe what happens in the former case. Since the CPU clearly has to fetch each instruction from memory before executing it, each instruction must start on a cycle number which is a multiple of 4. For example, if the instructions INC DE:EXX are executed then since INC DE takes 6 cycles the EXX must wait a further 2 cycles before being executed. [Aside: the memory access actually occurs on the third cycle of an instruction, so what happens is that the CPU starts fetching the EXX immediately but has to wait for 2 cycles during the instruction fetch.] We usually include the two cycles in the timing for INC DE and say that INC DE takes 8 cycles, since that is more convenient. Interestingly enough, INC DE has the distinction of being an instruction that takes the same length of time whether the screen is being drawn or not. Most instructions that are not memory intensive and do not use I/O simply have their times rounded up to the next multiple of 4 for the above reason. For example: instruction official time Sam time INC r 4 4 INC rr 6 8 INC IX 10 12 ADD HL,rr 11 12 ADD IX,rr 15 16 LD r,n 7 8 LD rr,nn 10 12 JR cc,d if cc 12 else 7 if cc 12 else 8 JP cc,nn 10 12 RET cc if cc 10 else 5 if cc 12 else 8. Instructions that are memory intensive sometimes take more time. This depends on what each instruction does. The Z80 is usually, though not always, predictable in the amount of time it takes to do something. For example (the left-hand column gives letters by which these actions will be referred to later on): ref action time F instruction fetch 4 [includes execution time for simple loads & ALU] A memory access 3 L 8-bit ALU 1 I 16-bit inc/dec 3 [except PC, and SP during stack operations] J relative jump 5 X add d to IX 5. A memory access and increment operation, which happens during instruction fetches, double byte memory fetches, block operations and stack operations, takes only 3 cycles, presumably because an increment circuit is built in to the memory access path of the Z80. In the case of the PUSH instruction the stack pointer has to be decremented before the first memory access; this takes 1 cycle (referred to as D below. This also applies to the DEC BC cycle of an LDIR instruction). So the following instruction timings result. Elements of the form w2 in the right-hand column denote cycles during which the CPU has to wait for a memory access. instruction official time Sam time PUSH rr F+D+A+A = 11 F+D+w3+A+w1+A+w1 = 16 POP rr F+A+A = 10 F+A+w1+A+w1 = 12 CALL cc,nn F+A+A+D+A+A = 17 F+A+w1+A+D+A+w1+A+w1 = 20 [if cc] CALL cc,nn F+A+A = 10 F+A+w1+A+w1 = 12 [if not cc] LD HL,(nn) F+A+A+A+A = 16 F+A+w1+A+w1+A+w1+A+w1 = 20 DJNZ d F+L+A+J = 13 F+L+w3+A+J = 16 [if B>0] DJNZ d F+L+A = 8 F+L+w3+A+w1 = 12 [if B=0] LDIR F+F+A+A+1+D+J=21 F+F+A+w1+A+1+D+J+w2 = 24 [if BC>0] LDIR F+F+A+A+1+D = 16 F+F+A+w1+A+1+D+w3 = 20 [if BC=0] (since LDIR and OTIR take the same amount of time officially, and since an I/O operation takes one cycle longer than a memory access, the Z80 must for some reason insert the extra 1 into an LDIR, which is shown above). Interestingly enough, if DE points to the ROM when an LDIR is carried out then there are no wait states in the case that BC=0 and the operation takes 16 cycles. I/O operations are slightly different from memory fetches. Officially they take 4 cycles because the Z80's I/O cycle is the same as a memory cycle but with an added wait state. However, I/O ports 248-255 inclusive are contended by the ASIC, which allows only one access every 8 cycles. For this reason, the time taken by an I/O instruction depends upon where it is in the program. For example, OUT (254),A usually takes 12 cycles (F+A+w1+O, where O is the I/O operation) but if two of them are executed in sequence then the second one will take 16 cycles (F+A+w5+O). Assuming that each instruction starts on an 8-cycle boundary, we have the following. instruction official time Sam time OUT (C),r F+F+O = 12 F+F+O = 12 NOP:OUT (C),r F+F+F+O = 16 F+F+F+w4+O = 20 [if c>247] OUT (n),A F+A+O = 11 F+A+w1+O = 12 [if n>247] OUT (n),A F+A+O = 11 F+A+O+w1 = 12 [if n<248] OTIR F+F+A+O+L+J = 21 F+F+A+w5+O+L+J+w2 = 28 [if c>247 & b>0] OTIR F+F+A+O+L = 16 F+F+A+w5+O+L+w3 = 24 [if c>247 & b=0] NOP:OTIR F+F+F+A+O+L+J=25 F+F+F+A+w1+O+L+J+w2=28 [if c>247 & b>0] NOP:OTIR F+F+F+A+O+L = 20 F+F+F+A+w1+O+L+w3 = 24 [if c>247 & b=0] OTIR F+F+A+O+L+J = 21 F+F+A+O+L+J+w3 = 24 [if c<248 b>0] OTIR F+F+A+O+L = 16 F+F+A+O+L = 16 [if c<248 b=0] And that's what I know about instruction timings. imc

