I assume the synthesis is automagically using the schematic on page 6 of this document;
http://www.xilinx.com/bvdocs/appnotes/xapp467.pdf
Is there no way to do the first step (the multipliers) as extra logic in stage 2? No wait, that was running at clock_2x, so maybe stage 1? The final add of all intermediate results in stage 4?

I got no idea how much the delay is through the dedicated hardware multiplier. Try clipping x and y to 17 bits and see what the synthesis results are then. Are they (besides unusable) fast enough then?

Mike
www.wacco.mveas.com

PS: SVN seems to be down, I'm looking at an old copy of hq.

On 12 Aug 2007, at 21:13, Timothy Normand Miller wrote:

I've checked in some changes to hq.  There are a few bug fixes and
also a hack to add an input port and an output port as synthesis
placeholders.

So, we have some synthesis results.  The winner is:  The multiplier.
To make a 32x32 multiplier, four of the 18x18's have to be bolted
together, and this is what we get:

Slack:                  -12.191ns (requirement - (data path - clock
path skew + uncertainty))
  Source:               hq/stg2/y_lookup_r_16 (FF)
  Destination:          hq/stg3/res_r_25 (FF)
  Requirement:          10.000ns
  Data Path Delay:      22.191ns (Levels of Logic = 15)
  Clock Path Skew:      0.000ns
  Source Clock:         clock_2x_bufg rising at 10.000ns
  Destination Clock:    clock_bufg rising at 20.000ns
  Clock Uncertainty:    0.000ns
  Timing Improvement Wizard
  Data Path: hq/stg2/y_lookup_r_16 to hq/stg3/res_r_25
    Delay type         Delay(ns)  Logical Resource(s)
    ----------------------------  -------------------
    Tcko                  0.626   hq/stg2/y_lookup_r_16
    net (fanout=1)        0.475   hq/stg2/y_lookup_r<16>
    Tilo                  0.529   hq/stg2/v_o<16>_SW0
    net (fanout=2)        0.016   N4985
    Tilo                  0.529   hq/stg2/y_o<16>1
    net (fanout=4)        3.689   hq/s2_y<16>
    Tmult                 3.851   hq/stg3/multiplier/Mmult_z_submult_2
net (fanout=1) 4.221 hq/stg3/multiplier/ Mmult_z_submult_2_25 Topcyg 0.904 hq/stg3/multiplier/ Mmult_z1_Madd_lut<25> hq/stg3/multiplier/ Mmult_z1_Madd_cy<25> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z1_Madd_cy<25> Tbyp 0.111 hq/stg3/multiplier/ Mmult_z1_Madd_cy<26> hq/stg3/multiplier/ Mmult_z1_Madd_cy<27> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z1_Madd_cy<29> Tciny 0.803 hq/stg3/multiplier/ Mmult_z1_Madd_cy<30> hq/stg3/multiplier/ Mmult_z1_Madd_xor<31>
    net (fanout=1)        1.150   hq/stg3/multiplier/Mmult_z1_Madd_31
Topcyg 0.954 hq/stg3/multiplier/ Mmult_z2_Madd_lut<48> hq/stg3/multiplier/ Mmult_z2_Madd_cy<48> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z2_Madd_cy<48> Tbyp 0.104 hq/stg3/multiplier/ Mmult_z2_Madd_cy<49> hq/stg3/multiplier/ Mmult_z2_Madd_cy<50> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z2_Madd_cy<50> Tbyp 0.104 hq/stg3/multiplier/ Mmult_z2_Madd_cy<51> hq/stg3/multiplier/ Mmult_z2_Madd_cy<52> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z2_Madd_cy<52> Tbyp 0.104 hq/stg3/multiplier/ Mmult_z2_Madd_cy<53> hq/stg3/multiplier/ Mmult_z2_Madd_cy<54> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z2_Madd_cy<54> Tbyp 0.104 hq/stg3/multiplier/ Mmult_z2_Madd_cy<55> hq/stg3/multiplier/ Mmult_z2_Madd_cy<56> net (fanout=1) 0.000 hq/stg3/multiplier/ Mmult_z2_Madd_cy<56> Tcinx 0.786 hq/stg3/multiplier/ Mmult_z2_Madd_xor<57>
    net (fanout=2)        1.379   hq/stg3/Mshift_mul_shift0001_Sh<121>
    Tilo                  0.529   hq/stg3/res_r_mux0000<25>128
    net (fanout=1)        0.512   hq/stg3/res_r_mux0000<25>128/O
    Tfck                  0.600   hq/stg3/res_r_mux0000<25>2
                                  hq/stg3/res_r_25
    ----------------------------  ---------------------------
    Total                22.191ns (10.749ns logic, 11.442ns route)
                                  (48.4% logic, 51.6% route)


Too much multiply and add logic.  We want 10ns, but we're getting
22ns.  We need to think about ways to either stretch the pipeline, run
the multiply as a parallel pipeline, or use fewer bits in the
multiplier and/or multiplicand.

--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to