Ultimately, we may decide not to even have a multiply instruction and
just code it when necessary.  This would be horribly slow, but if it's
a rare event, it won't matter so much.  All I can think of for this is
where we want to multiply a 16-bit unsigned line stride by a 16-bit
signed Y coordinate.  In other cases, we multiply by a constant,
eliminating any branches (or decisions anyhow) entirely.

Not really horribly slow, actually. If you use the Russian peasants algorithm [1],
you can implement it in around 200 cycles. In pseudocode:
z=x*y, t1 temporary.
run 32 times:
        mov x to t1
        AND t1 with 0x0001 -- these two just get the last bit
        skip if zero:
                add y to z -- this deals with remainder
        shift x right  1 -- half it
        shift z left 1 -- double it

which should be 32 * 6 = 192 cycles.
Of course, it is 16 * 6 = 96 if you are doing 16x16 (and 1 or 2 less with 16x15 with sign) multiplication.

This might be wrong (in terms of what ends up where), but it is the right idea.

nick
[1] http://mathforum.org/dr.math/faq/faq.peasant.html

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to