Nicholas S-A wrote:
Ultimately, we may decide not to even have a multiply instruction and
just code it when necessary.  This would be horribly slow, but if it's
a rare event, it won't matter so much.  All I can think of for this is
where we want to multiply a 16-bit unsigned line stride by a 16-bit
signed Y coordinate.  In other cases, we multiply by a constant,
eliminating any branches (or decisions anyhow) entirely.

Not really horribly slow, actually. If you use the Russian peasants algorithm [1],
you can implement it in around 200 cycles. In pseudocode:
z=x*y, t1 temporary.
run 32 times:
    mov x to t1
    AND t1 with 0x0001 -- these two just get the last bit
    skip if zero:
        add y to z -- this deals with remainder
    shift x right  1 -- half it
    shift z left 1 -- double it

which should be 32 * 6 = 192 cycles.

There's loop control overhead, too:

    mult:  move 32 to t2
           move  0 to  z
    loop:  move  x to t1
           and t1 with 1
           beq skip
           shift x right 1         # delay slot
           add   y to z
    skip:  subtract 1 from t2
           bne loop
           shift z left 1          # delay slot

Which is 32 * (7 or 8) + 2 = 226-258 cycles, I guess. Still around 200, as you said, but it's also still an order of magnitude slower than the same algorithm implemented in hardware (and I believe it would be comparatively cheap in hardware). As for whether or not that's horrible, well, it's a matter of opinion. :)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to