The multipliers are probably going to be the biggest performance bottleneck in the design. Depending on what blocks are available we might be able to pipeline it more deeply in order to get higher frequency. As it is, it's fully pipelined at whatever frequency a 18x18 multiplier will allow.
On Sun, Jan 13, 2013 at 5:31 PM, "Ing. Daniel Rozsnyó" <[email protected]>wrote: > I know that this is a generic multiplier, but in practice, would that map > 1:1 to logic gates, or would it be possible to multiply the i/o frequency > locally by 4 times (e.g. 1GHz -> 4GHz) to achieve a one clock delay > multiply? > > Daniel > > > > On 01/13/2013 09:46 PM, Timothy Normand Miller wrote: > >> >> >> >> >> // TODO: Actually use clock enables >> >> module four_stage_signed_35x35_**multiply( >> input clock, >> input [34:0] A, >> input [34:0] B, >> output reg [69:0] P); >> >> // Pipeline state 0: Perform all multiplies >> wire [35:0] p0a, p2a, p3a; >> wire [33:0] p1a; >> MULT18X18S mul0 (.C(clock), .CE(1'b1), .R(1'b0), .P(p0a), .A(A[34:17]), >> .B(B[34:17])); >> MULT18X18S mul1 (.C(clock), .CE(1'b1), .R(1'b0), .P(p1a), .A({1'b0, >> A[16:0]}), .B({1'b0, B[16:0]})); >> MULT18X18S mul2 (.C(clock), .CE(1'b1), .R(1'b0), .P(p2a), .A(A[34:17]), >> .B({1'b0, B[16:0]})); >> MULT18X18S mul3 (.C(clock), .CE(1'b1), .R(1'b0), .P(p3a), .A({1'b0, >> A[16:0]}), .B(B[34:17])); >> >> // Pipeline stage 1: Sum middle terms >> reg [35:0] p0b, p2b; >> reg [33:0] p1b; >> always @(posedge clock) begin >> p0b <= p0a; >> p1b <= p1a; >> p2b <= p2a + p3a; >> end >> >> // Pipeline stage 2: Lower half of final sum >> wire [34:0] wlower_a, wlower_b, wupper_a, wupper_b; >> assign {wupper_a, wlower_a} = {p0b, p1b}; >> assign {wupper_b, wlower_b} = {{17{p2b[35]}}, p2b, {17{1'b0}}}; >> reg [34:0] upper_a, upper_b; >> reg [35:0] lower_sum; >> always @(posedge clock) begin >> lower_sum <= wlower_a + wlower_b; >> upper_a <= wupper_a; >> upper_b <= wupper_b; >> end >> >> // Pipeline stage 3: Upper half of final sum, with carry in >> wire [35:0] upper_sum = {upper_a, 1'b1} + {upper_b, lower_sum[35]}; >> always @(posedge clock) begin >> P[34:0] <= lower_sum[34:0]; >> P[69:35] <= upper_sum[35:1]; >> end >> >> endmodule >> >> >> // synthesis translate_off >> module MULT18X18S( >> input C, >> input CE, >> input R, >> output reg [35:0] P, >> input [17:0] A, >> input [17:0] B); >> >> wire signed [17:0] a, b; >> assign a = A; >> assign b = B; >> >> wire signed [35:0] p; >> assign p = a * b; >> >> always @(posedge C) begin >> if (R) begin >> P <= 0; >> end else >> if (CE) begin >> P <= p; >> end >> end >> >> endmodule >> // synthesis translate_on >> >> >> -- >> Timothy Normand Miller, PhD >> Assistant Professor of Computer Science, Binghamton University >> http://www.cs.binghamton.edu/~**millerti/<http://www.cs.binghamton.edu/~millerti/> >> Open Graphics Project >> >> >> ______________________________**_________________ >> Open-graphics mailing list >> [email protected] >> http://lists.duskglow.com/**mailman/listinfo/open-graphics<http://lists.duskglow.com/mailman/listinfo/open-graphics> >> List service provided by Duskglow Consulting, LLC (www.duskglow.com) >> >> -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
