On 2007-08-13, Andre Pouliot wrote:
>  Another possibility is to create a device that do the multiply like
> coproccesor that isn't part of the ALU but is memory mapped in direct
> adressing(if the nanocontroller support direct and indirect adressing). It
> would look like 4 adressable register  2 for the input value and 2 for the
> result. If something like this is done the multiply could be pipelined as
> long as wanted and while waiting for the result other operations would be
> possible and wouldn't lock the alu.

That gives two stores an one fetch.  The current CPU does not compute
anything (other than the address) for store and fetch, so this approach
reduce multiply to 3 instructions, but *only* if we are able to fill in
all instruction between the stores and the fetch.  I'm thinking
multiplication may be a bit too cheap operation to really take advantage
of this technique, but then maybe not.

A variant could be to have two dedicated registers for multiply.  Write
to them, and the last write triggers the multiply.  After a number of
cycles the result can be read out.  The advantage of using registers is
that we can do computations both during stores and during the fetch.  In
other words, 0 operation multiply, which is of course a too good to be
the whole truth.

Whether we use IO ports or registers,  there will be strict constraints
in instruction ordering.  This may not be too bad as long as we only do
it for multiply, so they presumably are rare enough that they don't
interfere and create tricky logical optimisation puzzles.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to