Getting back into z80 programming (code included)
Having prototyped my new polygon filler to my satisfaction in C, today I've been converting it to assembler. With the iPhone stuff and an Acorn Electron project I've been working on, I haven't done any z80 in far too long and am not particularly optimistic that I'm writing good stuff. Actually, it strikes me that I've never really shown any z80 code to anyone, so maybe I'm just not great in general. Below is most of my new polygon filler. It's incomplete, but only in relatively minor ways — the scan converter handles edges where x increases only (obviously x decreases will be the same code with subs and decs rather than adds and incs, thought I'd leave that until I'm more confident in the stuff overall) and chucks pixels on the screen to show scanline ends rather than drawing an actual scanline of pixels (for which I'll be subverting SP per the usual sort of stuff). When calculating y intercepts it breaks down to either traditional Bresenham for lines that change in y more than in x or run-slice Bresenham for lines that change in x more than in y. Part of the reasoning for that is that it gives me something to compare the speeds of the two approaches. If run-slice does seem to be faster than standard for lines above a certain length (probably 9 or 10 pixels?) as I suspect, then obviously I'll use it for both. Anyway, if some of you z80 experts could have a quick look and tell me if I'm making any obvious style errors or otherwise missing obvious optimisations — even if only on a peephole level — I'd be infinitely grateful. Sorry if the comments are occasionally a bit opaque; some of them just document which registers are holding which variables from the original C. Thanks in advance! ; ; DrawPoly - draws a filled polygon using A vertices, in two arrays ; with x positions starting at (H:0) and y positions at (H+1:0) ; ; clobbers: af, bc, de, hl, af', bc', de', hl' ; DS ALIGN 256 LEFTTAB: ds 256 RIGHTTTAB: ds 256 NUMVERTS: db 0 VERTEXPOINTER: dw 0 STARTY: db 0 ENDY: db 0 DrawPoly: ; store stuff ld (VERTEXPOINTER), hl ld (NUMVERTS), a inc h ld e, a ld d, a ; use b to store current highest vertex pointer, c to store value ld l, 0 ld b, 0 ld c, (hl) ; get highest vertex pointer to b @highloop: inc l ; look at next y value ; check if look is over yet, exit if so dec d jr z, @+highloopdone ld a, (hl) ; load new y value cp c; compare to current highest jr nc, @-highloop ; don't do anything if it is lower ld b, l ld c, a jr @-highloop @highloopdone: ; highest value is now in c ld a, c ld (ENDY), a ; use c to store current lowest vertex pointer, d to store value ld l, 0 ld c, 0 ld d, (hl) ; get highest vertex pointer to c @lowloop: inc l ; look at next y value ; check if loop is over yet, exit if so dec e jr z, @+lowloopdone ld a, (hl) cp d jr c, @-lowloop ld c, l ld d, a jr @-lowloop @lowloopdone: ; highest value is now in d ld a, d ld (STARTY), a push bc ; b = current vertex, c = target ld hl, RIGHTTTAB @leftloop: ld a, b cp c jr z, @+leftloopdone dec a jp p, @+noreload ld a, (NUMVERTS) dec a @noreload: call @+PushToArray ld b, a jr @-leftloop @leftloopdone: pop bc ld hl, LEFTTAB ld d, (NUMVERTS) @rightloop: ld a, b cp c jr z, @+rightloopdone inc a cp d jr nz, @+noreload xor a @noreload: call @+PushToArray ld b, a jr @-rightloop @rightloopdone: ; ; page in the screen, for drawing ; LD C, HMPR IN a, (C) push af ld a, (rampage) OUT (C), a ld h, LEFTTAB 8 ld a, (ENDY) ld l, a ld a, (STARTY) sub l ld b, a @plotloop: ; left pixel ld a, (hl) inc h ; right pixel ld c, (hl) inc l dec h ld d, l ld e, a srl d rr e jr nc, @+rpx ld a, 0x0f
Re: Getting back into z80 programming (code included)
Just a quick glimps of the code shows you can save 8-clocks every itteration of your plotloop by getting rid of the SET 7,D instructions and putting a SCF before a RR D instruction. Replacing the SRL D's with RR D's. But as you're not likely to be using this particular plotloop in the final article it probably doesn't matter too much! That said, using SCF / RR D instead of SRL D / SET 7,D is something worth remembering for later! Better still, keep your screens at address 0 and do away with the SCF's altogether. Rearranging your registers so that you don't need to keep calculating the screen address is always a good optimisation... But I take it you're going to use the stack to shove the scanlines on-screen? Which is even better!! I've always been a big fan of using the stack!! :-)) Chris. @plotloop: ; left pixel ld a, (hl) inc h ; right pixel ld c, (hl) inc l dec h ld d, l ld e, a srl d rr e jr nc, @+rpx ld a, 0x0f jr @+pxd @rpx: ld a, 0xf0 @pxd: set 7, d ld (de), a ld d, l ld e, c srl d rr e jr nc, @+rpx ld a, 0x0e jr @+pxd @rpx: ld a, 0xe0 @pxd: set 7, d ld (de), a djnz @-plotloop
Re: Getting back into z80 programming (code included)
Oh, yeah, the plotloop is a complete placeholder, just so I can compare output to the C prototype. That said, I'd done exactly the same thing on scf versus set 7,d in my line drawing code (don't worry: just once, outside the loop). I must find time to finish documenting my vector drawing code as I'm sure that would only benefit from being seen by somebody other than me. The screen's at 32768 because I have multiplication tables occupying the highest and lowest 2kb of RAM. It's the usual squared and divided by 4 stuff, so by using that positioning I can do the relevant 16bit addition and subtraction in HL and then read the result directly from RAM without any further computation. On Sat, Oct 24, 2009 at 8:43 PM, Chris Pile chris.p...@blueyonder.co.uk wrote: Just a quick glimps of the code shows you can save 8-clocks every itteration of your plotloop by getting rid of the SET 7,D instructions and putting a SCF before a RR D instruction. Replacing the SRL D's with RR D's. But as you're not likely to be using this particular plotloop in the final article it probably doesn't matter too much! That said, using SCF / RR D instead of SRL D / SET 7,D is something worth remembering for later! Better still, keep your screens at address 0 and do away with the SCF's altogether. Rearranging your registers so that you don't need to keep calculating the screen address is always a good optimisation... But I take it you're going to use the stack to shove the scanlines on-screen? Which is even better!! I've always been a big fan of using the stack!! :-)) Chris. @plotloop: ; left pixel ld a, (hl) inc h ; right pixel ld c, (hl) inc l dec h ld d, l ld e, a srl d rr e jr nc, @+rpx ld a, 0x0f jr @+pxd @rpx: ld a, 0xf0 @pxd: set 7, d ld (de), a ld d, l ld e, c srl d rr e jr nc, @+rpx ld a, 0x0e jr @+pxd @rpx: ld a, 0xe0 @pxd: set 7, d ld (de), a djnz @-plotloop
Re: Getting back into z80 programming (code included)
On 24 Oct 2009, at 20:08, Thomas Harte wrote: Anyway, if some of you z80 experts could have a quick look and tell me if I'm making any obvious style errors or otherwise missing obvious optimisations — even if only on a peephole level — I'd be infinitely grateful. One trick I almost always use, is instead of: NUMVERTS: db 0 ... ld a, (NUMVERTS) I would write: NUMVERTS: equ $+1 ld a,00 i.e. the data byte of the instruction is overwritten when the symbol is used (other code can write the symbol as normal). You can safely do this even if you're writing to the next consecutive instruction (i.e. there are no pipelining issues to be concerned with. Naturally, I would be shot for proposing this on any processor newer than the Z80) This is a byte smaller, and 8 t-states faster (not much, but it can be useful if the value is used frequently. Of course, if you read the variable in several places only one of them can be modified in this way. Choose the one which is executed most often). You can do this for any instruction which takes a literal data byte or word (use EQU $+2 for an instruction which uses the index registers). The only gotcha is that you must be careful to write the correct data size back, i.e. never write a double-register to storage allocated by 'ld a,00' otherwise you will corrupt the following instruction. By the way: ld d, (NUMVERTS) I don't think you can do this? If you've managed to persuade pyz80 to accept that, I'd be interested to see what opcode it generated... NB. the transformed alternative *is* available. i.e.: NUMVERTS: equ $+1 ld d,00 HTH, Andrew -- http://www.intensity.org.uk/
Re: Getting back into z80 programming (code included)
One trick I almost always use, is instead of: [...] Oh, yes, smart move! I'm pretty sure I had at least one copy of Electron User that thought this technique so magnificent that it got a front page mention as discover extra registers or something like that. By the way: ld d, (NUMVERTS) I don't think you can do this? No, you're right, you can't. It silently substituted ld a, (NUMVERTS), so that loop was running quite a bit longer than it needed to and the result not being visibly different unless the polygon hits the first scanline. So easy to miss. To be honest, more than 50% of my bugs today have been the result of pyz80 silently substituting legal code for illegal code. All related to my sudden haziness on the z80, of course. On Sat, Oct 24, 2009 at 11:20 PM, Andrew Collier and...@intensity.org.uk wrote: On 24 Oct 2009, at 20:08, Thomas Harte wrote: Anyway, if some of you z80 experts could have a quick look and tell me if I'm making any obvious style errors or otherwise missing obvious optimisations — even if only on a peephole level — I'd be infinitely grateful. NUMVERTS: db 0 ... ld a, (NUMVERTS) I would write: NUMVERTS: equ $+1 ld a,00 i.e. the data byte of the instruction is overwritten when the symbol is used (other code can write the symbol as normal). You can safely do this even if you're writing to the next consecutive instruction (i.e. there are no pipelining issues to be concerned with. Naturally, I would be shot for proposing this on any processor newer than the Z80) This is a byte smaller, and 8 t-states faster (not much, but it can be useful if the value is used frequently. Of course, if you read the variable in several places only one of them can be modified in this way. Choose the one which is executed most often). You can do this for any instruction which takes a literal data byte or word (use EQU $+2 for an instruction which uses the index registers). The only gotcha is that you must be careful to write the correct data size back, i.e. never write a double-register to storage allocated by 'ld a,00' otherwise you will corrupt the following instruction. By the way: ld d, (NUMVERTS) I don't think you can do this? If you've managed to persuade pyz80 to accept that, I'd be interested to see what opcode it generated... NB. the transformed alternative *is* available. i.e.: NUMVERTS: equ $+1 ld d,00 HTH, Andrew -- http://www.intensity.org.uk/
Re: Getting back into z80 programming (code included)
On 24 Oct 2009, at 23:46, Thomas Harte wrote: By the way: ld d, (NUMVERTS) I don't think you can do this? No, you're right, you can't. It silently substituted ld a, (NUMVERTS), so that loop was running quite a bit longer than it needed to and the result not being visibly different unless the polygon hits the first scanline. So easy to miss. Which version of pyz80 are you using? For me, this instruction is caught by a testcase: $ cat test.z80s NUMVERTS: db 0 ld d,(NUMVERTS) $ pyz80 test.z80s pass 1 ... Error: Illegal combination of operands test.z80s:1 ld d,(NUMVERTS) Error: OpCode not recognised test.z80s:1 ld d,(NUMVERTS) Presumably your longer code sequence is catching it out somehow... As an aside, it's rather unfortunate that zilog chose to use parentheses to denote memory dereferences, as they're ambiguous with mathematical ordering operators. It's not immediately obvious that the following examples generate entirely different instructions, but that is a consequence of the only useful way I could parse them! ld hl,(NUMVERTS + 1) ld hl,(NUMVERTS) + 1 Andrew -- http://www.intensity.org.uk/
Re: Getting back into z80 programming (code included)
With most of Andrew's comments not yet incorporated, the source code that at least does a complete poly fill is below. So I've implemented the scanning for x negative and written a quick DrawScanline function. I guess the latter is going to be the only new interesting bit. And I know I'm posting prematurely, but it's late and I don't expect to have any time to work on this again until next weekend. There's no comment to the effect, but I am of course assuming that the jumping into the block of 'push de's with only some low byte arithmetic is safe because the DrawScanline function is 256-byte aligned at entry and substantially less than 192 bytes up to that point. Apart from seeing what I can tidy based on Andrew's dynamic modification hint and giving it a proper read through again when I have some perspective, I guess I should look into shoving at least my temporary variables and ideally some code into the 64 bytes that'll never be used at the end of each y intercept table. Incidentally, a quad with corners (2, 70), (28, 20), (85, 30) and (5, 90) costs only about 70,000 cycles (very approximately, and I mean real machine cycles with contention taken into account measured empirically on Sim Coupe). So that's about 70 cycles/pixel for that shape, which I think is not awful. ; ; DrawPoly - draws a polygon using A vertices, in two arrays ; with x positions starting at (H:0) and y positions at (H+1:0), ; filled with colour b (high and low nibbles will be plotted; ; stippling is an option) ; ; clobbers: af, bc, de, hl, af', bc', de', hl' ; DS ALIGN 256 LEFTTAB: ds 256 RIGHTTTAB: ds 256 @DrawScanline: push af push hl push bc ld (@+SPBackup), sp ; draw from c to a (a is on the right) on line l ; get the address of the first pixel into hl ld h, l ld l, a ; get the length of the line into b sub c ld b, a ; check if we're starting on an odd pixel scf rr h rr l jr c, @+nohangingpixel ; a is one after the last pixel; draw only up to a, not up to and including dec b ld a, (hl) and 0xf @HighColour1: equ $+1 or 0xf0 ld (hl), a dec l @nohangingpixel: inc l ; draw main body of pixels here - divide width by 4 and loop ld a, b srl a srl a ld sp, hl @FullColour1: equ $+1 ld d, 0xff @FullColour2: equ $+1 ld e, 0xff xor 255 inc a add 64 ld h, @+pushrun 8 add @+pushrun \ 256 ld l, a jp (hl) @pushrun: INCLUDE pushde64.z80s ; push de, 64 times over - kept elsewhere for tidyness ; check if there's an extra 2 pixels to draw rr b rl c rr b jr nc, @+noextradouble dec sp pop de @FullColour3: equ $+1 ld e, 0xff push de ; check if there's an extra 1 pixel to draw @noextradouble: rr c jr nc, @+noextrasingle dec sp pop de ld a, e and 0xf0 @LowColour1: equ $+1 or 0xf ld e, a push de @noextrasingle: ld sp, (@+SPBackup) pop bc pop hl pop af ret NUMVERTS: db 0 VERTEXPOINTER: dw 0 STARTY: db 0 ENDY: db 0 DrawPoly: ; dynamically reprogram scanline filler now, so b can be forgotten after this ld (NUMVERTS), a ld a, b polyt: ld (@-FullColour1), a ld (@-FullColour2), a ld (@-FullColour3), a and 0x0f ld (@-LowColour1), a ld a, b and 0xf0 ld (@-HighColour1), a ; store stuff ld a, (NUMVERTS) ld
Re: Getting back into z80 programming (code included)
Oh, your implied guess was quite right, I was a version behind. I'd say you fooled me by leaving Version 1.1, released 13 April 2007 at the top of the read me despite having updated the version history but actually I spotted 1.2, downloaded it and then failed to do anything more whatsoever. Entirely my own fault, apologies. On Sun, Oct 25, 2009 at 12:09 AM, Andrew Collier and...@intensity.org.uk wrote: On 24 Oct 2009, at 23:46, Thomas Harte wrote: By the way: ld d, (NUMVERTS) I don't think you can do this? No, you're right, you can't. It silently substituted ld a, (NUMVERTS), so that loop was running quite a bit longer than it needed to and the result not being visibly different unless the polygon hits the first scanline. So easy to miss. Which version of pyz80 are you using? For me, this instruction is caught by a testcase: $ cat test.z80s NUMVERTS: db 0 ld d,(NUMVERTS) $ pyz80 test.z80s pass 1 ... Error: Illegal combination of operands test.z80s:1 ld d,(NUMVERTS) Error: OpCode not recognised test.z80s:1 ld d,(NUMVERTS) Presumably your longer code sequence is catching it out somehow... As an aside, it's rather unfortunate that zilog chose to use parentheses to denote memory dereferences, as they're ambiguous with mathematical ordering operators. It's not immediately obvious that the following examples generate entirely different instructions, but that is a consequence of the only useful way I could parse them! ld hl,(NUMVERTS + 1) ld hl,(NUMVERTS) + 1 Andrew -- http://www.intensity.org.uk/