The multiplier block in FPGA are rather fast, so running them at twice
or 4 time the clock speed could be possible. In an asic they would
actually slow down the design because of the logic depth.
On 2013-01-13 18:52, Timothy Normand Miller wrote:
The multipliers are probably going to be the biggest performance
bottleneck in the design. Depending on what blocks are available we
might be able to pipeline it more deeply in order to get higher
frequency. As it is, it's fully pipelined at whatever frequency a
18x18 multiplier will allow.
On Sun, Jan 13, 2013 at 5:31 PM, "Ing. Daniel Rozsnyó"
<[email protected] <mailto:[email protected]>> wrote:
I know that this is a generic multiplier, but in practice, would
that map 1:1 to logic gates, or would it be possible to multiply
the i/o frequency locally by 4 times (e.g. 1GHz -> 4GHz) to
achieve a one clock delay multiply?
Daniel
On 01/13/2013 09:46 PM, Timothy Normand Miller wrote:
// TODO: Actually use clock enables
module four_stage_signed_35x35_multiply(
input clock,
input [34:0] A,
input [34:0] B,
output reg [69:0] P);
// Pipeline state 0: Perform all multiplies
wire [35:0] p0a, p2a, p3a;
wire [33:0] p1a;
MULT18X18S mul0 (.C(clock), .CE(1'b1), .R(1'b0), .P(p0a),
.A(A[34:17]),
.B(B[34:17]));
MULT18X18S mul1 (.C(clock), .CE(1'b1), .R(1'b0), .P(p1a),
.A({1'b0,
A[16:0]}), .B({1'b0, B[16:0]}));
MULT18X18S mul2 (.C(clock), .CE(1'b1), .R(1'b0), .P(p2a),
.A(A[34:17]),
.B({1'b0, B[16:0]}));
MULT18X18S mul3 (.C(clock), .CE(1'b1), .R(1'b0), .P(p3a),
.A({1'b0,
A[16:0]}), .B(B[34:17]));
// Pipeline stage 1: Sum middle terms
reg [35:0] p0b, p2b;
reg [33:0] p1b;
always @(posedge clock) begin
p0b <= p0a;
p1b <= p1a;
p2b <= p2a + p3a;
end
// Pipeline stage 2: Lower half of final sum
wire [34:0] wlower_a, wlower_b, wupper_a, wupper_b;
assign {wupper_a, wlower_a} = {p0b, p1b};
assign {wupper_b, wlower_b} = {{17{p2b[35]}}, p2b, {17{1'b0}}};
reg [34:0] upper_a, upper_b;
reg [35:0] lower_sum;
always @(posedge clock) begin
lower_sum <= wlower_a + wlower_b;
upper_a <= wupper_a;
upper_b <= wupper_b;
end
// Pipeline stage 3: Upper half of final sum, with carry in
wire [35:0] upper_sum = {upper_a, 1'b1} + {upper_b,
lower_sum[35]};
always @(posedge clock) begin
P[34:0] <= lower_sum[34:0];
P[69:35] <= upper_sum[35:1];
end
endmodule
// synthesis translate_off
module MULT18X18S(
input C,
input CE,
input R,
output reg [35:0] P,
input [17:0] A,
input [17:0] B);
wire signed [17:0] a, b;
assign a = A;
assign b = B;
wire signed [35:0] p;
assign p = a * b;
always @(posedge C) begin
if (R) begin
P <= 0;
end else
if (CE) begin
P <= p;
end
end
endmodule
// synthesis translate_on
--
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
<http://www.cs.binghamton.edu/%7Emillerti/>
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected] <mailto:[email protected]>
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC
(www.duskglow.com <http://www.duskglow.com>)
--
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
<http://www.cs.binghamton.edu/%7Emillerti/>
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)