There's loop control overhead, too:
...
Which is 32 * (7 or 8) + 2 = 226-258 cycles, I guess. Still around 200, as you said, but it's also still an order of magnitude slower than the same algorithm implemented in hardware (and I believe it would be comparatively cheap in hardware). As for whether or not that's horrible, well, it's a matter
of opinion. :)

That was actually my original version, but I didn't want to look at the docs
to find the branch delay, and figured I could say to anybody who brought
this up that we would just unfold all the loops (which we would for this). ;-)

Yes, I agree that hardware is much, much better, even with just an MSTEP
instruction (which could just do the two shifts and tested-add), but I did want to bring up that it is not <horrible> if we are pressed for space in the XP10 and
need to do software multiply.

Cheers!
nick

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to