On Sat, September 1, 2007 12:34 pm, Petter Urkedal said: > On 2007-08-31, Mark wrote: >> I've posted a run-down of the multipliers so far (any important ones >> missing?) at http://jarvin.net/opengraphics/. This includes photos of >> the most critical path post-PAR. > > Nice overview of the syntheses; I've had some trouble getting the > synthesis tools working, so I appreciate you effort. I suspect my > version did not synthesise as I intended. In the attached version I made > the LUT4s explicit by putting them in a separate module. Not sure if it > would have made a difference. Well, I think we can go with the radix-4 > version unless there is compelling reason to optimise further *and* it is > technically feasible to use a 4x clock for the multiplier (which I don't > know). > > So, let's consider integrating Farhan's version in the nanocontroller. > Given that the VGA code will use 16 bit, would it be better to reduce the > multiplier to 16x16->32? Will this be insufficient for the DMA code? > (Does DMA require multiply at all, other than powers of 2?) Conversely is > 33 cycles multiply to slow for the VGA code, and would 17 cycles be fast > enough? > 16x16 will take 9 cycles. Perhaps it could be fast enough to be clocked at 200mhz if it is a dedicated 16x16 part. I'll try this out. I will also try adding a special mode to the current 32x32 version that assumes 16 bit inputs and takes 9 cycles to complete for that mode.
> I'd go with the non-blocking out-of-band approach. That is, the > programmer will count instructions before fetching the result. One > instruction takes a reg and a reg/imm and issues the multiply, and > another writes back result to a reg. The ALU stage can be the point of > transit. The issue-multiply instruction transfers the ALU operands to > the multiplier and initiates the multiply. The multiplier holds the > result after finishing as long as no new multiply is issued. A > fetch-product instruction moves the result to the ALU output, thus > allowing it to be part in register-forwarding. > > As a slight variant, we can hard-code the multiplication result to r31 > and drop the fetch-product instruction. That's just as easy to > implement, and it saves one cycle, since it means the product can be > directly used as an operand to the ALU. > > The introduction of interrupts, if needed, will not cause problems as > long as interrupt handlers don't use the multiplier. Moreover, if an > interrupt handler needs to use the multiplier, this is also possible: > When the interrupt handler is sure any pending multiplication is > finished, it can save the result R. Then it can do it's own > multiplication. Before returning to normal code, it must perform a > multiply R*1 and wait long enough for the result to be available. > _______________________________________________ Open-graphics mailing > list [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics List service > provided by Duskglow Consulting, LLC (www.duskglow.com) _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
