Re: [Open-graphics] Synthesizing oga1hq

Mark Mon, 13 Aug 2007 12:21:05 -0700

Petter Urkedal wrote:

If we want to do a compromise, we could instead implement 32x16->32
multiply.  That is, two multipliers in the ALU stage, and an adder in
the IO stage.  If again we incorporate the shifts, we are down to 4
instructions for to compute a 32x32->32 product:


mul_32x32_from_32x16:
        mul/h   r0, r1, r3      ; r3 := r0 * r1[31:16]
        mul/l   r0, r1, r2      ; r2 := r0 * r1[15:0]
        shift   r3, 16, r3
        add     r2, r3, r2

Note that register forwarding does not work fully for the mul
instruction in this case, since it's split over two stages.  There is a
1 cycle delay before we can use the result, which means this is the only
way to order the instructions.

My guess is that the 16x16->32 multiplier with shifts on both the second
operand and the result is much cheaper than the extra adder and
multiplier of the 32x16->32 solution, and we save only one instruction
by by going to 32x16->32.

How about if the shift was implicit in mul/h? That should be cheap interms of hardware and it would decrease the cost of the soft 32x32multiply to three cycles -- wouldn't it? (Sorry -- I have yet to read upon your architecture in detail.)

You can do 32x32 multiplies at nearly 200 MHz on the XC3S4000 with thecaveat that they must be fully pipelined (4 stages -- just add the extrastages to the output of the inferred multiplier and XST will retime themback in). Can you afford to deepen the pipeline or stall on 32x32multiplications?

Which exact part is on the OGD1 (speed grade & package)? (I may betotally lost, but you're using a Xilinx part, not the Lattice part onthe svn schematic, right?)

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Synthesizing oga1hq

Reply via email to