On 2007-08-13, Andre Pouliot wrote: > Another possibility is to create a device that do the multiply like > coproccesor that isn't part of the ALU but is memory mapped in direct > adressing(if the nanocontroller support direct and indirect adressing). It > would look like 4 adressable register 2 for the input value and 2 for the > result. If something like this is done the multiply could be pipelined as > long as wanted and while waiting for the result other operations would be > possible and wouldn't lock the alu.
That gives two stores an one fetch. The current CPU does not compute anything (other than the address) for store and fetch, so this approach reduce multiply to 3 instructions, but *only* if we are able to fill in all instruction between the stores and the fetch. I'm thinking multiplication may be a bit too cheap operation to really take advantage of this technique, but then maybe not. A variant could be to have two dedicated registers for multiply. Write to them, and the last write triggers the multiply. After a number of cycles the result can be read out. The advantage of using registers is that we can do computations both during stores and during the fetch. In other words, 0 operation multiply, which is of course a too good to be the whole truth. Whether we use IO ports or registers, there will be strict constraints in instruction ordering. This may not be too bad as long as we only do it for multiply, so they presumably are rare enough that they don't interfere and create tricky logical optimisation puzzles. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
