When I start thinking about bits per joule, (or multiplies per joule), I 
start wondering if we can run the multiplier(s) on a separate clock from
everything else, and be able to scale the speed up and down depending on
some software algorithms that know if this particular multiply is in the
critical path for some other computation, or if it's just a bulk-parallel
multiply where total energy matters more than time-to-answer?


On Mon, Jan 14, 2013 at 10:46:03AM -0500, Timothy Normand Miller wrote:
> Where I have used these, the worst part is the wire delay from logic to the
> multiplier block and back again.  I have often had to add extra registers
> in inputs and outputs just to get rid of those delay bottlenecks.
> 
> 
> On Sun, Jan 13, 2013 at 7:17 PM, Andr? Pouliot <[email protected]>wrote:
> 
> > The multiplier block in FPGA are rather fast, so running them at twice or
> > 4 time the clock speed could be possible. In an asic they would actually
> > slow down the design because of the logic depth.
> >
> >
> >
> > On 2013-01-13 18:52, Timothy Normand Miller wrote:
> >
> >> The multipliers are probably going to be the biggest performance
> >> bottleneck in the design.  Depending on what blocks are available we might
> >> be able to pipeline it more deeply in order to get higher frequency.  As it
> >> is, it's fully pipelined at whatever frequency a 18x18 multiplier will
> >> allow.
> >>
> >>
> >> On Sun, Jan 13, 2013 at 5:31 PM, "Ing. Daniel Rozsny?" <
> >> [email protected] <mailto:[email protected]>> wrote:
> >>
> >>     I know that this is a generic multiplier, but in practice, would
> >>     that map 1:1 to logic gates, or would it be possible to multiply
> >>     the i/o frequency locally by 4 times (e.g. 1GHz -> 4GHz) to
> >>     achieve a one clock delay multiply?
> >>
> >>     Daniel
> >>
> >>
> >>
> >>     On 01/13/2013 09:46 PM, Timothy Normand Miller wrote:
> >>
> >>
> >>
> >>
> >>
> >>         // TODO:  Actually use clock enables
> >>
> >>         module four_stage_signed_35x35_**multiply(
> >>              input clock,
> >>              input [34:0] A,
> >>              input [34:0] B,
> >>              output reg [69:0] P);
> >>
> >>         // Pipeline state 0:  Perform all multiplies
> >>         wire [35:0] p0a, p2a, p3a;
> >>         wire [33:0] p1a;
> >>         MULT18X18S mul0 (.C(clock), .CE(1'b1), .R(1'b0), .P(p0a),
> >>         .A(A[34:17]),
> >>         .B(B[34:17]));
> >>         MULT18X18S mul1 (.C(clock), .CE(1'b1), .R(1'b0), .P(p1a),
> >>         .A({1'b0,
> >>         A[16:0]}), .B({1'b0, B[16:0]}));
> >>         MULT18X18S mul2 (.C(clock), .CE(1'b1), .R(1'b0), .P(p2a),
> >>         .A(A[34:17]),
> >>         .B({1'b0, B[16:0]}));
> >>         MULT18X18S mul3 (.C(clock), .CE(1'b1), .R(1'b0), .P(p3a),
> >>         .A({1'b0,
> >>         A[16:0]}), .B(B[34:17]));
> >>
> >>         // Pipeline stage 1:  Sum middle terms
> >>         reg [35:0] p0b, p2b;
> >>         reg [33:0] p1b;
> >>         always @(posedge clock) begin
> >>              p0b <= p0a;
> >>              p1b <= p1a;
> >>              p2b <= p2a + p3a;
> >>         end
> >>
> >>         // Pipeline stage 2:  Lower half of final sum
> >>         wire [34:0] wlower_a, wlower_b, wupper_a, wupper_b;
> >>         assign {wupper_a, wlower_a} = {p0b, p1b};
> >>         assign {wupper_b, wlower_b} = {{17{p2b[35]}}, p2b, {17{1'b0}}};
> >>         reg [34:0] upper_a, upper_b;
> >>         reg [35:0] lower_sum;
> >>         always @(posedge clock) begin
> >>              lower_sum <= wlower_a + wlower_b;
> >>              upper_a <= wupper_a;
> >>              upper_b <= wupper_b;
> >>         end
> >>
> >>         // Pipeline stage 3:  Upper half of final sum, with carry in
> >>         wire [35:0] upper_sum = {upper_a, 1'b1} + {upper_b,
> >>         lower_sum[35]};
> >>         always @(posedge clock) begin
> >>              P[34:0] <= lower_sum[34:0];
> >>              P[69:35] <= upper_sum[35:1];
> >>         end
> >>
> >>         endmodule
> >>
> >>
> >>         // synthesis translate_off
> >>         module MULT18X18S(
> >>              input C,
> >>              input CE,
> >>              input R,
> >>              output reg [35:0] P,
> >>              input [17:0] A,
> >>              input [17:0] B);
> >>
> >>         wire signed [17:0] a, b;
> >>         assign a = A;
> >>         assign b = B;
> >>
> >>         wire signed [35:0] p;
> >>         assign p = a * b;
> >>
> >>         always @(posedge C) begin
> >>              if (R) begin
> >>                  P <= 0;
> >>              end else
> >>              if (CE) begin
> >>                  P <= p;
> >>              end
> >>         end
> >>
> >>         endmodule
> >>         // synthesis translate_on
> >>
> >>
> >>         --
> >>         Timothy Normand Miller, PhD
> >>         Assistant Professor of Computer Science, Binghamton University
> >>         
> >> http://www.cs.binghamton.edu/~**millerti/<http://www.cs.binghamton.edu/~millerti/>
> >>         
> >> <http://www.cs.binghamton.edu/**%7Emillerti/<http://www.cs.binghamton.edu/%7Emillerti/>
> >> >
> >>
> >>         Open Graphics Project
> >>
> >>
> >>         ______________________________**_________________
> >>         Open-graphics mailing list
> >>         [email protected] 
> >> <mailto:Open-graphics@**duskglow.com<[email protected]>
> >> >
> >>
> >>         
> >> http://lists.duskglow.com/**mailman/listinfo/open-graphics<http://lists.duskglow.com/mailman/listinfo/open-graphics>
> >>         List service provided by Duskglow Consulting, LLC
> >>         (www.duskglow.com <http://www.duskglow.com>)
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Timothy Normand Miller, PhD
> >> Assistant Professor of Computer Science, Binghamton University
> >> http://www.cs.binghamton.edu/~**millerti/<http://www.cs.binghamton.edu/~millerti/><
> >> http://www.cs.binghamton.edu/**%7Emillerti/<http://www.cs.binghamton.edu/%7Emillerti/>
> >> >
> >>
> >> Open Graphics Project
> >>
> >>
> >> ______________________________**_________________
> >> Open-graphics mailing list
> >> [email protected]
> >> http://lists.duskglow.com/**mailman/listinfo/open-graphics<http://lists.duskglow.com/mailman/listinfo/open-graphics>
> >> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
> >>
> >
> > ______________________________**_________________
> > Open-graphics mailing list
> > [email protected]
> > http://lists.duskglow.com/**mailman/listinfo/open-graphics<http://lists.duskglow.com/mailman/listinfo/open-graphics>
> > List service provided by Duskglow Consulting, LLC (www.duskglow.com)
> >
> 
> 
> 
> -- 
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project

> _______________________________________________
> Open-graphics mailing list
> [email protected]
> http://lists.duskglow.com/mailman/listinfo/open-graphics
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)


-- 
--------------------------------------------------------------------------
Troy Benjegerdes                'da hozer'                 [email protected]

Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software & hardware (http://q3u.be) stuff and not get a real job.
Charles Shultz had the best answer:

"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to