Ultimately, we may decide not to even have a multiply instruction and
just code it when necessary. This would be horribly slow, but if it's
a rare event, it won't matter so much. All I can think of for this is
where we want to multiply a 16-bit unsigned line stride by a 16-bit
signed Y coordinate. In other cases, we multiply by a constant,
eliminating any branches (or decisions anyhow) entirely.
Not really horribly slow, actually. If you use the Russian peasants
algorithm [1],
you can implement it in around 200 cycles. In pseudocode:
z=x*y, t1 temporary.
run 32 times:
mov x to t1
AND t1 with 0x0001 -- these two just get the last bit
skip if zero:
add y to z -- this deals with remainder
shift x right 1 -- half it
shift z left 1 -- double it
which should be 32 * 6 = 192 cycles.
Of course, it is 16 * 6 = 96 if you are doing 16x16 (and 1 or 2 less
with 16x15 with sign) multiplication.
This might be wrong (in terms of what ends up where), but it is the
right idea.
nick
[1] http://mathforum.org/dr.math/faq/faq.peasant.html
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)