On Sat, September 1, 2007 12:34 pm, Petter Urkedal said:
> On 2007-08-31, Mark wrote:
>> I've posted a run-down of the multipliers so far (any important ones 
>> missing?) at http://jarvin.net/opengraphics/.  This includes photos of
>> the most critical path post-PAR.
> 
> Nice overview of the syntheses;  I've had some trouble getting the 
> synthesis tools working, so I appreciate you effort.  I suspect my 
> version did not synthesise as I intended.  In the attached version I made
> the LUT4s explicit by putting them in a separate module.  Not sure if it
> would have made a difference.  Well, I think we can go with the radix-4
> version unless there is compelling reason to optimise further *and* it is
> technically feasible to use a 4x clock for the multiplier (which I don't
> know).
> 
> So, let's consider integrating Farhan's version in the nanocontroller. 
> Given that the VGA code will use 16 bit, would it be better to reduce the
> multiplier to 16x16->32?  Will this be insufficient for the DMA code?
> (Does DMA require multiply at all, other than powers of 2?) Conversely is
> 33 cycles multiply to slow for the VGA code, and would 17 cycles be fast
> enough?
>
 
16x16 will take 9 cycles. Perhaps it could be fast enough to be clocked 
at 200mhz if it is a dedicated 16x16 part. I'll try this out. I will also 
try adding a special mode to the current 32x32 version that assumes 16 
bit inputs and takes 9 cycles to complete for that mode.


> I'd go with the non-blocking out-of-band approach.  That is, the 
> programmer will count instructions before fetching the result.  One 
> instruction takes a reg and a reg/imm and issues the multiply, and 
> another writes back result to a reg.  The ALU stage can be the point of
> transit.  The issue-multiply instruction transfers the ALU operands to
> the multiplier and initiates the multiply.  The multiplier holds the
> result after finishing as long as no new multiply is issued. A
> fetch-product instruction moves the result to the ALU output, thus 
> allowing it to be part in register-forwarding.
> 
> As a slight variant, we can hard-code the multiplication result to r31 
> and drop the fetch-product instruction.  That's just as easy to 
> implement, and it saves one cycle, since it means the product can be 
> directly used as an operand to the ALU.
> 
> The introduction of interrupts, if needed, will not cause problems as 
> long as interrupt handlers don't use the multiplier.  Moreover, if an 
> interrupt handler needs to use the multiplier, this is also possible: 
> When the interrupt handler is sure any pending multiplication is 
> finished, it can save the result R.  Then it can do it's own 
> multiplication.  Before returning to normal code, it must perform a 
> multiply R*1 and wait long enough for the result to be available. 
> _______________________________________________ Open-graphics mailing
> list [email protected] 
> http://lists.duskglow.com/mailman/listinfo/open-graphics List service
> provided by Duskglow Consulting, LLC (www.duskglow.com)

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to