While thinking about it during the afternoon, I have decided that
since this table is absolutely fundamental to my program (it's part of
the table based 2.8x2.8 multiply), I'm just going to centre it on
address 0. Then I can cut the ld bc and the add hl, bc. I'm happy to
rearrange my table, so I guess the comparison is between:
add hl, hl
ld e, (hl)
inc l
ld d, (hl)
and:
sla h
ld e, (hl)
inc h
ld d, (hl)
Obviously the second is faster — but I'm curious about your cycle
counts. All the z80 documentation I have lists "add hl, ss" as 11 t-
states. Why have you turned that into 16? It looks from your other
calculations that you're just rounding to the next multiple of 4
(which may be a good rule of thumb for the Sam, I don't know, I've
just been letting Sim Coupe work it out), so why isn't it 12?
Anyway, with moving the table to 0 and rearranging it I have a
reasonably accurate 2.8 x 2.8 multiply that operates in just 109 paper
cycles, or between 152 and 268 Sim Coupe cycles. Annoyingly I
currently have the screen in the low 32kb and my program in the high
32kb so there's a whole bunch of things to change before I can
actually see what overall effect that has on my framerate versus my
current 200 to 304 Sim Coupe cycles...
On 20 May 2008, at 17:28, Andrew Collier wrote:
On Tue, May 20, 2008 at 03:22:54PM +0100, Andrew Collier wrote:
correction:
ld a,h 4
sla l 8
rla 4
add table/256 8
ld h,a 4
= 28
You can also shave a little more time if you're willing to rearrange
the table:
Instead of word pairs (low byte, high byte, low byte, high byte) you
could have
alternating lines of 256 low bytes, 256 high bytes. To use that,
double the high byte of
the address but don't change the low byte (in other words, don't run
the 'sla l' at all
saving 8 t-states) and then, when reading DE from the table,
increment H instead of
incrementing L to get the high byte corresponding to the selected
low byte.
HTH,
Andrew
--
--- Andrew Collier ----
---- http://www.intensity.org.uk/ ---
--
r<2+ T<4* cSEL dMS hEn/CB<BL A4 S+*<++ C$++L/mP W- a-- Vh+seT+
(Cantab) 1.1.4