Dear users and developers, This is my first mail on this mailing list, and it is also the first time I come back to Z80-assembly, after a pause of a decade and a half...
I know I'm late, because 8x8-bits multiplication on Z80 is "Now replaced by a built-in for code generation, but"... I have to start from somewhere... I'm trying to substitute the code for "__muluchar_rrx_s::" in device/lib/z80/mulchar.s with some faster code. If I'm not wrong: current code uses (342+b*6) T-states (call and ret excluded), where (b) is the number of bit set to "1" in the second operand. Mine uses (356-b), this means that the current code is faster than mine only when the second operand is zero or a power of two, and slower every time the second operand has at least three bits set to "1" (85% of the values in 0-255). Advantages of the new code: - faster (most of the times), saving 14 T-states on average - does not mess with DE - can be easily modified to give its result on DE (or BC) Drawbacks: - overwrites the accumulator A Parity: - same memory footprint (2 bytes can be saved, but costs 30 T-states) The patch follows (it applies to sdcc 2.9.0): ---------8<---------8<---------8<---------8<---------8<---------8<----- --- mulchar.s 2009-03-27 11:31:35.000000000 +0100 +++ sdcc/device/lib/z80/mulchar.s 2009-01-05 11:20:47.000000000 +0100 @@ -1,38 +1,25 @@ .area _CODE -;; Multiply two 8-bits operands, giving a 16-bits result -;; by Marco Bodrato, March 27, 2009, licensed GPLv2+ -;; Before: -;; On-stack: return address, operands -;; After: -;; B=0 -;; HL=result -;; A=H, F=[H=N=Cy=0,(P,S,Z, depend on L)] -;; -;; Timings: -;; Total cycles needed depend on z=the number of "0" bits in L; -;; returns after (348+z). -;; Notes: HL can be replaced by DE (or, a bit trickier, BC) - +; This multiplication routine is similar to the one +; from Rodnay Zaks, "Programming the Z80". + ; Now replaced by a builtin for code generation, but ; still called from some asm files in this directory. __muluchar_rrx_s:: - pop af - pop hl ; Load operands - push hl ; and recover stack - push af - ;; registers H and L now store the two operands - xor a + ld hl, #2+1 + add hl, sp + ld e, (hl) + dec hl + ld h, (hl) + ld l, #0 + ld d, l ld b, #8 muluchar_rrx_s_loop: - rr l + add hl, hl jr nc, muluchar_rrx_s_noadd - add a, h + add hl, de muluchar_rrx_s_noadd: - rra djnz muluchar_rrx_s_loop - rr l ; result is in AL, now... - ld h, a ret ; operands have different sign ---------8<---------8<---------8<---------8<---------8<---------8<----- Let me know if you find it interesting! If you do, I can try to optimise also the "code generation" (but I'll need some hints) and maybe other multiplication routines... Regards, Marco -- http://bodrato.it/software/ ------------------------------------------------------------------------------ _______________________________________________ Sdcc-user mailing list Sdcc-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sdcc-user