Nicholas S-A wrote:
Ultimately, we may decide not to even have a multiply instruction and
just code it when necessary. This would be horribly slow, but if it's
a rare event, it won't matter so much. All I can think of for this is
where we want to multiply a 16-bit unsigned line stride by a 16-bit
signed Y coordinate. In other cases, we multiply by a constant,
eliminating any branches (or decisions anyhow) entirely.
Not really horribly slow, actually. If you use the Russian peasants
algorithm [1],
you can implement it in around 200 cycles. In pseudocode:
z=x*y, t1 temporary.
run 32 times:
mov x to t1
AND t1 with 0x0001 -- these two just get the last bit
skip if zero:
add y to z -- this deals with remainder
shift x right 1 -- half it
shift z left 1 -- double it
which should be 32 * 6 = 192 cycles.
There's loop control overhead, too:
mult: move 32 to t2
move 0 to z
loop: move x to t1
and t1 with 1
beq skip
shift x right 1 # delay slot
add y to z
skip: subtract 1 from t2
bne loop
shift z left 1 # delay slot
Which is 32 * (7 or 8) + 2 = 226-258 cycles, I guess. Still around 200,
as you said, but it's also still an order of magnitude slower than the
same algorithm implemented in hardware (and I believe it would be
comparatively cheap in hardware). As for whether or not that's
horrible, well, it's a matter of opinion. :)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)