Dear users and developers,

This is my first mail on this mailing list, and it is also the first time
I come back to Z80-assembly, after a pause of a decade and a half...

I know I'm late, because 8x8-bits multiplication on Z80 is "Now replaced
by a built-in for code generation, but"... I have to start from
somewhere...

I'm trying to substitute the code for "__muluchar_rrx_s::" in
device/lib/z80/mulchar.s with some faster code.
If I'm not wrong: current code uses (342+b*6) T-states (call and ret
excluded), where (b) is the number of bit set to "1" in the second
operand.
Mine uses (356-b), this means that the current code is faster than mine
only when the second operand is zero or a power of two, and slower every
time the second operand has at least three bits set to "1" (85% of the
values in 0-255).

Advantages of the new code:
 - faster (most of the times), saving 14 T-states on average
 - does not mess with DE
 - can be easily modified to give its result on DE (or BC)
Drawbacks:
 - overwrites the accumulator A
Parity:
 - same memory footprint (2 bytes can be saved, but costs 30 T-states)

The patch follows (it applies to sdcc 2.9.0):
---------8<---------8<---------8<---------8<---------8<---------8<-----
--- mulchar.s   2009-03-27 11:31:35.000000000 +0100
+++ sdcc/device/lib/z80/mulchar.s       2009-01-05 11:20:47.000000000 +0100
@@ -1,38 +1,25 @@
         .area   _CODE

-;; Multiply two 8-bits operands, giving a 16-bits result
-;; by Marco Bodrato, March 27, 2009, licensed GPLv2+
-;; Before:
-;;     On-stack:       return address, operands
-;; After:
-;;     B=0
-;;     HL=result
-;;     A=H, F=[H=N=Cy=0,(P,S,Z, depend on L)]
-;;
-;; Timings:
-;;     Total cycles needed depend on z=the number of "0" bits in L;
-;;     returns after (348+z).
-;; Notes:      HL can be replaced by DE (or, a bit trickier, BC)
-
+; This multiplication routine is similar to the one
+; from Rodnay Zaks, "Programming the Z80".
+
 ; Now replaced by a builtin for code generation, but
 ; still called from some asm files in this directory.
 __muluchar_rrx_s::
-        pop     af
-        pop     hl     ; Load operands
-        push    hl     ; and recover stack
-        push    af
-        ;; registers H and L now store the two operands
-        xor     a
+        ld      hl, #2+1
+        add     hl, sp
+        ld      e, (hl)
+        dec     hl
+        ld      h, (hl)
+        ld      l, #0
+        ld      d, l
         ld      b, #8
 muluchar_rrx_s_loop:
-        rr      l
+        add     hl, hl
         jr      nc, muluchar_rrx_s_noadd
-        add     a, h
+        add     hl, de
 muluchar_rrx_s_noadd:
-        rra
         djnz    muluchar_rrx_s_loop
-        rr      l      ; result is in AL, now...
-        ld      h, a
         ret

 ; operands have different sign
---------8<---------8<---------8<---------8<---------8<---------8<-----

Let me know if you find it interesting! If you do, I can try to optimise
also the "code generation" (but I'll need some hints) and maybe other
multiplication routines...

Regards,
Marco

-- 
http://bodrato.it/software/


------------------------------------------------------------------------------
_______________________________________________
Sdcc-user mailing list
Sdcc-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sdcc-user

Reply via email to