Getting back into z80 programming (code included)

2009-10-24 Thread Thomas Harte
Having prototyped my new polygon filler to my satisfaction in C, today
I've been converting it to assembler. With the iPhone stuff and an
Acorn Electron project I've been working on, I haven't done any z80 in
far too long and am not particularly optimistic that I'm writing good
stuff. Actually, it strikes me that I've never really shown any z80
code to anyone, so maybe I'm just not great in general.

Below is most of my new polygon filler. It's incomplete, but only in
relatively minor ways — the scan converter handles edges where x
increases only (obviously x decreases will be the same code with subs
and decs rather than adds and incs, thought I'd leave that until I'm
more confident in the stuff overall) and chucks pixels on the screen
to show scanline ends rather than drawing an actual scanline of pixels
(for which I'll be subverting SP per the usual sort of stuff).

When calculating y intercepts it breaks down to either traditional
Bresenham for lines that change in y more than in x or run-slice
Bresenham for lines that change in x more than in y. Part of the
reasoning for that is that it gives me something to compare the speeds
of the two approaches. If run-slice does seem to be faster than
standard for lines above a certain length (probably 9 or 10 pixels?)
as I suspect, then obviously I'll use it for both.

Anyway, if some of you z80 experts could have a quick look and tell me
if I'm making any obvious style errors or otherwise missing obvious
optimisations — even if only on a peephole level — I'd be infinitely
grateful. Sorry if the comments are occasionally a bit opaque; some of
them just document which registers are holding which variables from
the original C.

Thanks in advance!




;
;   DrawPoly - draws a filled polygon using A vertices, in two arrays
;   with x positions starting at (H:0) and y positions at (H+1:0)
;
;   clobbers: af, bc, de, hl, af', bc', de', hl'
;

DS ALIGN 256

LEFTTAB:
ds 256
RIGHTTTAB:
ds 256

NUMVERTS:
db 0
VERTEXPOINTER:
dw 0

STARTY:
db 0
ENDY:
db 0

DrawPoly:

; store stuff
ld (VERTEXPOINTER), hl
ld (NUMVERTS), a
inc h
ld e, a
ld d, a

; use b to store current highest vertex pointer, c to store value
ld l, 0
ld b, 0
ld c, (hl)

; get highest vertex pointer to b

@highloop:
inc l   ; look at next y value

; check if look is over yet, exit if so
dec d
jr z, @+highloopdone

ld a, (hl)  ; load new y value
cp c; compare to current highest
jr nc, @-highloop   ; don't do anything if it is lower

ld b, l
ld c, a
jr @-highloop

@highloopdone:

; highest value is now in c
ld a, c
ld (ENDY), a

; use c to store current lowest vertex pointer, d to store value
ld l, 0
ld c, 0
ld d, (hl)

; get highest vertex pointer to c
@lowloop:
inc l   ; look at next y value

; check if loop is over yet, exit if so
dec e
jr z, @+lowloopdone

ld a, (hl)
cp d
jr c, @-lowloop

ld c, l
ld d, a
jr @-lowloop

@lowloopdone:

; highest value is now in d
ld a, d
ld (STARTY), a

push bc ; b = current vertex, c = target

ld hl, RIGHTTTAB

@leftloop:
ld a, b
cp c
jr z, @+leftloopdone

dec a
jp p, @+noreload

ld a, (NUMVERTS)
dec a

@noreload:

call @+PushToArray
ld b, a
jr @-leftloop

@leftloopdone:

pop bc
ld hl, LEFTTAB
ld d, (NUMVERTS)

@rightloop:
ld a, b
cp c
jr z, @+rightloopdone

inc a
cp d
jr nz, @+noreload

xor a

@noreload:

call @+PushToArray
ld b, a
jr @-rightloop

@rightloopdone:

;
; page in the screen, for drawing
;

LD C, HMPR
IN a, (C)
push af
ld a, (rampage)
OUT (C), a

ld h, LEFTTAB  8
ld a, (ENDY)
ld l, a

ld a, (STARTY)
sub l
ld b, a

@plotloop:

; left pixel
ld a, (hl)
inc h

; right pixel
ld c, (hl)
inc l

dec h

ld d, l
ld e, a
srl d
rr e
jr nc, @+rpx

ld a, 0x0f
 

Re: Getting back into z80 programming (code included)

2009-10-24 Thread Chris Pile

Just a quick glimps of the code shows you can save 8-clocks every
itteration of your plotloop by getting rid of the SET 7,D instructions
and putting a SCF before a RR D instruction.  Replacing the SRL D's
with RR D's.

But as you're not likely to be using this particular plotloop in the final
article it probably doesn't matter too much!

That said, using SCF / RR D instead of SRL D / SET 7,D is something
worth remembering for later!

Better still, keep your screens at address 0 and do away with the SCF's
altogether.

Rearranging your registers so that you don't need to keep calculating the
screen address is always a good optimisation...  But I take it you're going
to use the stack to shove the scanlines on-screen?  Which is even better!!

I've always been a big fan of using the stack!!  :-))

Chris.





@plotloop:

; left pixel
ld a, (hl)
inc h

; right pixel
ld c, (hl)
inc l

dec h

ld d, l
ld e, a
srl d
rr e
jr nc, @+rpx

ld a, 0x0f
jr @+pxd

@rpx:
ld a, 0xf0

@pxd:
set 7, d
ld (de), a

ld d, l
ld e, c
srl d
rr e
jr nc, @+rpx

ld a, 0x0e
jr @+pxd

@rpx:
ld a, 0xe0

@pxd:
set 7, d
ld (de), a

djnz @-plotloop



Re: Getting back into z80 programming (code included)

2009-10-24 Thread Thomas Harte
Oh, yeah, the plotloop is a complete placeholder, just so I can
compare output to the C prototype. That said, I'd done exactly the
same thing on scf versus set 7,d in my line drawing code (don't worry:
just once, outside the loop). I must find time to finish documenting
my vector drawing code as I'm sure that would only benefit from being
seen by somebody other than me.

The screen's at 32768 because I have multiplication tables occupying
the highest and lowest 2kb of RAM. It's the usual squared and divided
by 4 stuff, so by using that positioning I can do the relevant 16bit
addition and subtraction in HL and then read the result directly from
RAM without any further computation.

On Sat, Oct 24, 2009 at 8:43 PM, Chris Pile chris.p...@blueyonder.co.uk wrote:
 Just a quick glimps of the code shows you can save 8-clocks every
 itteration of your plotloop by getting rid of the SET 7,D instructions
 and putting a SCF before a RR D instruction.  Replacing the SRL D's
 with RR D's.

 But as you're not likely to be using this particular plotloop in the final
 article it probably doesn't matter too much!

 That said, using SCF / RR D instead of SRL D / SET 7,D is something
 worth remembering for later!

 Better still, keep your screens at address 0 and do away with the SCF's
 altogether.

 Rearranging your registers so that you don't need to keep calculating the
 screen address is always a good optimisation...  But I take it you're going
 to use the stack to shove the scanlines on-screen?  Which is even better!!

 I've always been a big fan of using the stack!!  :-))

 Chris.





 @plotloop:

 ; left pixel
 ld a, (hl)
 inc h

 ; right pixel
 ld c, (hl)
 inc l

 dec h

 ld d, l
 ld e, a
 srl d
 rr e
 jr nc, @+rpx

 ld a, 0x0f
 jr @+pxd

 @rpx:
 ld a, 0xf0

 @pxd:
 set 7, d
 ld (de), a

 ld d, l
 ld e, c
 srl d
 rr e
 jr nc, @+rpx

 ld a, 0x0e
 jr @+pxd

 @rpx:
 ld a, 0xe0

 @pxd:
 set 7, d
 ld (de), a

 djnz @-plotloop




Re: Getting back into z80 programming (code included)

2009-10-24 Thread Andrew Collier

On 24 Oct 2009, at 20:08, Thomas Harte wrote:


Anyway, if some of you z80 experts could have a quick look and tell me
if I'm making any obvious style errors or otherwise missing obvious
optimisations — even if only on a peephole level — I'd be infinitely
grateful.


One trick I almost always use, is instead of:


NUMVERTS:
db 0

...

ld a, (NUMVERTS)


I would write:

NUMVERTS: equ $+1
ld a,00

i.e. the data byte of the instruction is overwritten when the symbol  
is used (other code can write the symbol as normal). You can safely do  
this even if you're writing to the next consecutive instruction (i.e.  
there are no pipelining issues to be concerned with. Naturally, I  
would be shot for proposing this on any processor newer than the Z80)


This is a byte smaller, and 8 t-states faster (not much, but it can be  
useful if the value is used frequently. Of course, if you read the  
variable in several places only one of them can be modified in this  
way. Choose the one which is executed most often).


You can do this for any instruction which takes a literal data byte or  
word (use EQU $+2 for an instruction which uses the index registers).  
The only gotcha is that you must be careful to write the correct data  
size back, i.e. never write a double-register to storage allocated by  
'ld a,00' otherwise you will corrupt the following instruction.


By the way:


ld d, (NUMVERTS)


I don't think you can do this?
If you've managed to persuade pyz80 to accept that, I'd be interested  
to see what opcode it generated...


NB. the transformed alternative *is* available. i.e.:
NUMVERTS: equ $+1
ld d,00

HTH,
Andrew

--
http://www.intensity.org.uk/





Re: Getting back into z80 programming (code included)

2009-10-24 Thread Thomas Harte
 One trick I almost always use, is instead of:
[...]

Oh, yes, smart move! I'm pretty sure I had at least one copy of
Electron User that thought this technique so magnificent that it got a
front page mention as discover extra registers or something like
that.

 By the way:

ld d, (NUMVERTS)

 I don't think you can do this?

No, you're right, you can't. It silently substituted ld a, (NUMVERTS),
so that loop was running quite a bit longer than it needed to and the
result not being visibly different unless the polygon hits the first
scanline. So easy to miss.

To be honest, more than 50% of my bugs today have been the result of
pyz80 silently substituting legal code for illegal code. All related
to my sudden haziness on the z80, of course.

On Sat, Oct 24, 2009 at 11:20 PM, Andrew Collier
and...@intensity.org.uk wrote:
 On 24 Oct 2009, at 20:08, Thomas Harte wrote:

 Anyway, if some of you z80 experts could have a quick look and tell me
 if I'm making any obvious style errors or otherwise missing obvious
 optimisations — even if only on a peephole level — I'd be infinitely
 grateful.


 NUMVERTS:
        db 0

 ...

        ld a, (NUMVERTS)

 I would write:

 NUMVERTS: equ $+1
            ld a,00

 i.e. the data byte of the instruction is overwritten when the symbol is used
 (other code can write the symbol as normal). You can safely do this even if
 you're writing to the next consecutive instruction (i.e. there are no
 pipelining issues to be concerned with. Naturally, I would be shot for
 proposing this on any processor newer than the Z80)

 This is a byte smaller, and 8 t-states faster (not much, but it can be
 useful if the value is used frequently. Of course, if you read the variable
 in several places only one of them can be modified in this way. Choose the
 one which is executed most often).

 You can do this for any instruction which takes a literal data byte or word
 (use EQU $+2 for an instruction which uses the index registers). The only
 gotcha is that you must be careful to write the correct data size back, i.e.
 never write a double-register to storage allocated by 'ld a,00' otherwise
 you will corrupt the following instruction.

 By the way:

        ld d, (NUMVERTS)

 I don't think you can do this?
 If you've managed to persuade pyz80 to accept that, I'd be interested to see
 what opcode it generated...

 NB. the transformed alternative *is* available. i.e.:
 NUMVERTS: equ $+1
            ld d,00

 HTH,
 Andrew

 --
 http://www.intensity.org.uk/






Re: Getting back into z80 programming (code included)

2009-10-24 Thread Andrew Collier

On 24 Oct 2009, at 23:46, Thomas Harte wrote:


By the way:


  ld d, (NUMVERTS)


I don't think you can do this?


No, you're right, you can't. It silently substituted ld a, (NUMVERTS),
so that loop was running quite a bit longer than it needed to and the
result not being visibly different unless the polygon hits the first
scanline. So easy to miss.



Which version of pyz80 are you using? For me, this instruction is  
caught by a testcase:


$ cat  test.z80s
NUMVERTS: db 0
ld d,(NUMVERTS)
$ pyz80 test.z80s
pass  1 ...
Error: Illegal combination of operands
test.z80s:1 ld d,(NUMVERTS)
Error: OpCode not recognised
test.z80s:1 ld d,(NUMVERTS)

Presumably your longer code sequence is catching it out somehow...

As an aside, it's rather unfortunate that zilog chose to use  
parentheses to denote memory dereferences, as they're ambiguous with  
mathematical ordering operators. It's not immediately obvious that the  
following examples generate entirely different instructions, but that  
is a consequence of the only useful way I could parse them!


ld hl,(NUMVERTS + 1)
ld hl,(NUMVERTS) + 1

Andrew

--
http://www.intensity.org.uk/





Re: Getting back into z80 programming (code included)

2009-10-24 Thread Thomas Harte
With most of Andrew's comments not yet incorporated, the source code
that at least does a complete poly fill is below. So I've implemented
the scanning for x negative and written a quick DrawScanline function.
I guess the latter is going to be the only new interesting bit. And I
know I'm posting prematurely, but it's late and I don't expect to have
any time to work on this again until next weekend.

There's no comment to the effect, but I am of course assuming that the
jumping into the block of 'push de's with only some low byte
arithmetic is safe because the DrawScanline function is 256-byte
aligned at entry and substantially less than 192 bytes up to that
point.

Apart from seeing what I can tidy based on Andrew's dynamic
modification hint and giving it a proper read through again when I
have some perspective, I guess I should look into shoving at least my
temporary variables and ideally some code into the 64 bytes that'll
never be used at the end of each y intercept table.

Incidentally, a quad with corners (2, 70), (28, 20), (85, 30) and (5,
90) costs only about 70,000 cycles (very approximately, and I mean
real machine cycles with contention taken into account measured
empirically on Sim Coupe). So that's about 70 cycles/pixel for that
shape, which I think is not awful.

;
;   DrawPoly - draws a polygon using A vertices, in two arrays
;   with x positions starting at (H:0) and y positions at (H+1:0),
;   filled with colour b (high and low nibbles will be plotted;
;   stippling is an option)
;
;   clobbers: af, bc, de, hl, af', bc', de', hl'
;

DS ALIGN 256

LEFTTAB:
ds 256
RIGHTTTAB:
ds 256

@DrawScanline:

push af
push hl
push bc
ld (@+SPBackup), sp

; draw from c to a (a is on the right) on line l

; get the address of the first pixel into hl
ld h, l
ld l, a

; get the length of the line into b
sub c
ld b, a

; check if we're starting on an odd pixel
scf
rr h
rr l

jr c, @+nohangingpixel  ; a is one after the 
last pixel; draw only
up to a, not up to and including

dec b

ld a, (hl)
and 0xf

@HighColour1: equ $+1
or 0xf0

ld (hl), a
dec l

@nohangingpixel:
inc l

; draw main body of pixels here - divide width by 4 and loop

ld a, b
srl a
srl a

ld sp, hl
@FullColour1: equ $+1
ld d, 0xff
@FullColour2: equ $+1
ld e, 0xff

xor 255
inc a
add 64

ld h, @+pushrun  8
add @+pushrun \ 256
ld l, a
jp (hl)

@pushrun:

INCLUDE pushde64.z80s ; push de, 64 times 
over - kept elsewhere
for tidyness

; check if there's an extra 2 pixels to draw
rr b
rl c
rr b
jr nc, @+noextradouble

dec sp
pop de

@FullColour3: equ $+1
ld e, 0xff
push de

; check if there's an extra 1 pixel to draw

@noextradouble:
rr c
jr nc, @+noextrasingle

dec sp
pop de
ld a, e
and 0xf0
@LowColour1: equ $+1
or 0xf
ld e, a
push de

@noextrasingle:

ld sp, (@+SPBackup)
pop bc
pop hl
pop af

ret



NUMVERTS:
db 0
VERTEXPOINTER:
dw 0

STARTY:
db 0
ENDY:
db 0

DrawPoly:

; dynamically reprogram scanline filler now, so b can be forgotten 
after this
ld (NUMVERTS), a
ld a, b

polyt:
ld (@-FullColour1), a
ld (@-FullColour2), a
ld (@-FullColour3), a
and 0x0f
ld (@-LowColour1), a
ld a, b
and 0xf0
ld (@-HighColour1), a

; store stuff
ld a, (NUMVERTS)
ld 

Re: Getting back into z80 programming (code included)

2009-10-24 Thread Thomas Harte
Oh, your implied guess was quite right, I was a version behind. I'd
say you fooled me by leaving Version 1.1, released 13 April 2007 at
the top of the read me despite having updated the version history but
actually I spotted 1.2, downloaded it and then failed to do anything
more whatsoever.

Entirely my own fault, apologies.

On Sun, Oct 25, 2009 at 12:09 AM, Andrew Collier
and...@intensity.org.uk wrote:
 On 24 Oct 2009, at 23:46, Thomas Harte wrote:

 By the way:

      ld d, (NUMVERTS)

 I don't think you can do this?

 No, you're right, you can't. It silently substituted ld a, (NUMVERTS),
 so that loop was running quite a bit longer than it needed to and the
 result not being visibly different unless the polygon hits the first
 scanline. So easy to miss.


 Which version of pyz80 are you using? For me, this instruction is caught by
 a testcase:

 $ cat  test.z80s
 NUMVERTS: db 0
 ld d,(NUMVERTS)
 $ pyz80 test.z80s
 pass  1 ...
 Error: Illegal combination of operands
 test.z80s:1 ld d,(NUMVERTS)
 Error: OpCode not recognised
 test.z80s:1 ld d,(NUMVERTS)

 Presumably your longer code sequence is catching it out somehow...

 As an aside, it's rather unfortunate that zilog chose to use parentheses to
 denote memory dereferences, as they're ambiguous with mathematical ordering
 operators. It's not immediately obvious that the following examples generate
 entirely different instructions, but that is a consequence of the only
 useful way I could parse them!

        ld hl,(NUMVERTS + 1)
        ld hl,(NUMVERTS) + 1

 Andrew

 --
 http://www.intensity.org.uk/