Re: [Open-graphics] 4-stage signed multiplier 35x35

Timothy Normand Miller Wed, 23 Jan 2013 11:37:51 -0800

With a barrel processor design, expect floods of multiplies as every thread
executes the same multiply operation for the same warp interval.  For
matrix multiplies, expect floods of multiplies.  It's certainly a good
idea, however, to consider implementing a multi-cycle multiplier, which
would impact instruction scheduling.


5 minutes?  Servicing a page fault is measured in milliseconds, which in
CPU terms is already a million years.  What you want are short (hundreds to
thousands of instructions) representative micro-benchmarks.  If you want to
consider the instruction distribution for certain real-world workloads, a
single frame from somewhere in the middle of the job is already pretty
serious over-kill.

Mind you, if our GPU simulator can simulate 5 minutes in less than a few
days, that'll be a solid achievement.


On Tue, Jan 22, 2013 at 12:52 PM, Troy Benjegerdes <[email protected]> wrote:

> Is there any reasonable way to get an number on how many multiplies (and
> other types of operations, including bits moved) are required for the
> following:
>
> * 5 minutes of average facebook browser/rendering/gpu activity
> * 5 minutes of playing a DVD
> * 5 minutes of playing the MegaGlest game
> * 5 minutes of compiling linux/android/libreoffice/firefox (say 1ghz cpu)
>
> On Tue, Jan 22, 2013 at 03:45:06AM -0500, Timothy Normand Miller wrote:
> > Consider the use cases and the prevalence of multiplies.
> >
> >
> > On Tue, Jan 22, 2013 at 1:49 AM, Troy Benjegerdes <[email protected]>
> wrote:
> >
> > > well, why not clock rates of 0.5x 1x and 2x?  (where nominal design
> for the
> > > thermal envelope of the package is 1x, and then whatever is in the
> critical
> > > path can go to 2x if a nearby multiplier is not can go to 0.5x)
> > >
> > > I'm thinking this may be serious overengineering unless we have
> distributed
> > > power conversion and voltage regulators to go along with this scheme
> (the
> > > slower clocked areas get lower voltage)
> > >
> > > All that being said, power-of-2 clock multipliers might be helpful for
> > > asic vs fpga design flexibility.
> > >
> > > On Mon, Jan 14, 2013 at 07:38:26PM -0500, Timothy Normand Miller wrote:
> > > > If it's not something like 2x or 1/2x or other power of two, it's
> > > basically
> > > > impossible to do in the middle of a pipeline.  Also, keep in mind
> that we
> > > > reuse this for integer left shift and integer multiply, so we'll
> keep it
> > > > relatively busy, and we can also clock-gate.
> > > >
> > > >
> > > > On Mon, Jan 14, 2013 at 5:04 PM, Troy Benjegerdes <[email protected]>
> > > wrote:
> > > >
> > > > > When I start thinking about bits per joule, (or multiplies per
> joule),
> > > I
> > > > > start wondering if we can run the multiplier(s) on a separate clock
> > > from
> > > > > everything else, and be able to scale the speed up and down
> depending
> > > on
> > > > > some software algorithms that know if this particular multiply is
> in
> > > the
> > > > > critical path for some other computation, or if it's just a
> > > bulk-parallel
> > > > > multiply where total energy matters more than time-to-answer?
> > > > >
> > > > >
> > > > > On Mon, Jan 14, 2013 at 10:46:03AM -0500, Timothy Normand Miller
> wrote:
> > > > > > Where I have used these, the worst part is the wire delay from
> logic
> > > to
> > > > > the
> > > > > > multiplier block and back again.  I have often had to add extra
> > > registers
> > > > > > in inputs and outputs just to get rid of those delay bottlenecks.
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 13, 2013 at 7:17 PM, Andr? Pouliot <
> > > [email protected]
> > > > > >wrote:
> > > > > >
> > > > > > > The multiplier block in FPGA are rather fast, so running them
> at
> > > twice
> > > > > or
> > > > > > > 4 time the clock speed could be possible. In an asic they would
> > > > > actually
> > > > > > > slow down the design because of the logic depth.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 2013-01-13 18:52, Timothy Normand Miller wrote:
> > > > > > >
> > > > > > >> The multipliers are probably going to be the biggest
> performance
> > > > > > >> bottleneck in the design.  Depending on what blocks are
> available
> > > we
> > > > > might
> > > > > > >> be able to pipeline it more deeply in order to get higher
> > > frequency.
> > > > >  As it
> > > > > > >> is, it's fully pipelined at whatever frequency a 18x18
> multiplier
> > > will
> > > > > > >> allow.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Sun, Jan 13, 2013 at 5:31 PM, "Ing. Daniel Rozsny?" <
> > > > > > >> [email protected] <mailto:[email protected]>> wrote:
> > > > > > >>
> > > > > > >>     I know that this is a generic multiplier, but in practice,
> > > would
> > > > > > >>     that map 1:1 to logic gates, or would it be possible to
> > > multiply
> > > > > > >>     the i/o frequency locally by 4 times (e.g. 1GHz -> 4GHz)
> to
> > > > > > >>     achieve a one clock delay multiply?
> > > > > > >>
> > > > > > >>     Daniel
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>     On 01/13/2013 09:46 PM, Timothy Normand Miller wrote:
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>         // TODO:  Actually use clock enables
> > > > > > >>
> > > > > > >>         module four_stage_signed_35x35_**multiply(
> > > > > > >>              input clock,
> > > > > > >>              input [34:0] A,
> > > > > > >>              input [34:0] B,
> > > > > > >>              output reg [69:0] P);
> > > > > > >>
> > > > > > >>         // Pipeline state 0:  Perform all multiplies
> > > > > > >>         wire [35:0] p0a, p2a, p3a;
> > > > > > >>         wire [33:0] p1a;
> > > > > > >>         MULT18X18S mul0 (.C(clock), .CE(1'b1), .R(1'b0),
> .P(p0a),
> > > > > > >>         .A(A[34:17]),
> > > > > > >>         .B(B[34:17]));
> > > > > > >>         MULT18X18S mul1 (.C(clock), .CE(1'b1), .R(1'b0),
> .P(p1a),
> > > > > > >>         .A({1'b0,
> > > > > > >>         A[16:0]}), .B({1'b0, B[16:0]}));
> > > > > > >>         MULT18X18S mul2 (.C(clock), .CE(1'b1), .R(1'b0),
> .P(p2a),
> > > > > > >>         .A(A[34:17]),
> > > > > > >>         .B({1'b0, B[16:0]}));
> > > > > > >>         MULT18X18S mul3 (.C(clock), .CE(1'b1), .R(1'b0),
> .P(p3a),
> > > > > > >>         .A({1'b0,
> > > > > > >>         A[16:0]}), .B(B[34:17]));
> > > > > > >>
> > > > > > >>         // Pipeline stage 1:  Sum middle terms
> > > > > > >>         reg [35:0] p0b, p2b;
> > > > > > >>         reg [33:0] p1b;
> > > > > > >>         always @(posedge clock) begin
> > > > > > >>              p0b <= p0a;
> > > > > > >>              p1b <= p1a;
> > > > > > >>              p2b <= p2a + p3a;
> > > > > > >>         end
> > > > > > >>
> > > > > > >>         // Pipeline stage 2:  Lower half of final sum
> > > > > > >>         wire [34:0] wlower_a, wlower_b, wupper_a, wupper_b;
> > > > > > >>         assign {wupper_a, wlower_a} = {p0b, p1b};
> > > > > > >>         assign {wupper_b, wlower_b} = {{17{p2b[35]}}, p2b,
> > > > > {17{1'b0}}};
> > > > > > >>         reg [34:0] upper_a, upper_b;
> > > > > > >>         reg [35:0] lower_sum;
> > > > > > >>         always @(posedge clock) begin
> > > > > > >>              lower_sum <= wlower_a + wlower_b;
> > > > > > >>              upper_a <= wupper_a;
> > > > > > >>              upper_b <= wupper_b;
> > > > > > >>         end
> > > > > > >>
> > > > > > >>         // Pipeline stage 3:  Upper half of final sum, with
> carry
> > > in
> > > > > > >>         wire [35:0] upper_sum = {upper_a, 1'b1} + {upper_b,
> > > > > > >>         lower_sum[35]};
> > > > > > >>         always @(posedge clock) begin
> > > > > > >>              P[34:0] <= lower_sum[34:0];
> > > > > > >>              P[69:35] <= upper_sum[35:1];
> > > > > > >>         end
> > > > > > >>
> > > > > > >>         endmodule
> > > > > > >>
> > > > > > >>
> > > > > > >>         // synthesis translate_off
> > > > > > >>         module MULT18X18S(
> > > > > > >>              input C,
> > > > > > >>              input CE,
> > > > > > >>              input R,
> > > > > > >>              output reg [35:0] P,
> > > > > > >>              input [17:0] A,
> > > > > > >>              input [17:0] B);
> > > > > > >>
> > > > > > >>         wire signed [17:0] a, b;
> > > > > > >>         assign a = A;
> > > > > > >>         assign b = B;
> > > > > > >>
> > > > > > >>         wire signed [35:0] p;
> > > > > > >>         assign p = a * b;
> > > > > > >>
> > > > > > >>         always @(posedge C) begin
> > > > > > >>              if (R) begin
> > > > > > >>                  P <= 0;
> > > > > > >>              end else
> > > > > > >>              if (CE) begin
> > > > > > >>                  P <= p;
> > > > > > >>              end
> > > > > > >>         end
> > > > > > >>
> > > > > > >>         endmodule
> > > > > > >>         // synthesis translate_on
> > > > > > >>
> > > > > > >>
> > > > > > >>         --
> > > > > > >>         Timothy Normand Miller, PhD
> > > > > > >>         Assistant Professor of Computer Science, Binghamton
> > > University
> > > > > > >>         http://www.cs.binghamton.edu/~**millerti/<
> > > > > http://www.cs.binghamton.edu/~millerti/>
> > > > > > >>         <http://www.cs.binghamton.edu/**%7Emillerti/<
> > > > > http://www.cs.binghamton.edu/%7Emillerti/>
> > > > > > >> >
> > > > > > >>
> > > > > > >>         Open Graphics Project
> > > > > > >>
> > > > > > >>
> > > > > > >>         ______________________________**_________________
> > > > > > >>         Open-graphics mailing list
> > > > > > >>         [email protected] <mailto:Open-graphics@**
> > > > > duskglow.com<[email protected]>
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > http://lists.duskglow.com/**mailman/listinfo/open-graphics<
> > > > > http://lists.duskglow.com/mailman/listinfo/open-graphics>
> > > > > > >>         List service provided by Duskglow Consulting, LLC
> > > > > > >>         (www.duskglow.com <http://www.duskglow.com>)
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Timothy Normand Miller, PhD
> > > > > > >> Assistant Professor of Computer Science, Binghamton University
> > > > > > >> http://www.cs.binghamton.edu/~**millerti/<
> > > > > http://www.cs.binghamton.edu/~millerti/><
> > > > > > >> http://www.cs.binghamton.edu/**%7Emillerti/<
> > > > > http://www.cs.binghamton.edu/%7Emillerti/>
> > > > > > >> >
> > > > > > >>
> > > > > > >> Open Graphics Project
> > > > > > >>
> > > > > > >>
> > > > > > >> ______________________________**_________________
> > > > > > >> Open-graphics mailing list
> > > > > > >> [email protected]
> > > > > > >> http://lists.duskglow.com/**mailman/listinfo/open-graphics<
> > > > > http://lists.duskglow.com/mailman/listinfo/open-graphics>
> > > > > > >> List service provided by Duskglow Consulting, LLC (
> > > www.duskglow.com)
> > > > > > >>
> > > > > > >
> > > > > > > ______________________________**_________________
> > > > > > > Open-graphics mailing list
> > > > > > > [email protected]
> > > > > > > http://lists.duskglow.com/**mailman/listinfo/open-graphics<
> > > > > http://lists.duskglow.com/mailman/listinfo/open-graphics>
> > > > > > > List service provided by Duskglow Consulting, LLC (
> > > www.duskglow.com)
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Timothy Normand Miller, PhD
> > > > > > Assistant Professor of Computer Science, Binghamton University
> > > > > > http://www.cs.binghamton.edu/~millerti/
> > > > > > Open Graphics Project
> > > > >
> > > > > > _______________________________________________
> > > > > > Open-graphics mailing list
> > > > > > [email protected]
> > > > > > http://lists.duskglow.com/mailman/listinfo/open-graphics
> > > > > > List service provided by Duskglow Consulting, LLC (
> www.duskglow.com)
> > > > >
> > > > >
> > > > > --
> > > > >
> > >
> --------------------------------------------------------------------------
> > > > > Troy Benjegerdes                'da hozer'
> > > [email protected]
> > > > >
> > > > > Somone asked my why I work on this free (
> > > http://www.fsf.org/philosophy/)
> > > > > software & hardware (http://q3u.be) stuff and not get a real job.
> > > > > Charles Shultz had the best answer:
> > > > >
> > > > > "Why do musicians compose symphonies and poets write poems? They
> do it
> > > > > because life wouldn't have any meaning for them if they didn't.
> That's
> > > why
> > > > > I draw cartoons. It's my life." -- Charles Shultz
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Timothy Normand Miller, PhD
> > > > Assistant Professor of Computer Science, Binghamton University
> > > > http://www.cs.binghamton.edu/~millerti/
> > > > Open Graphics Project
> > >
> > > --
> > >
> --------------------------------------------------------------------------
> > > Troy Benjegerdes                'da hozer'
> [email protected]
> > >
> > > Somone asked my why I work on this free (
> http://www.fsf.org/philosophy/)
> > > software & hardware (http://q3u.be) stuff and not get a real job.
> > > Charles Shultz had the best answer:
> > >
> > > "Why do musicians compose symphonies and poets write poems? They do it
> > > because life wouldn't have any meaning for them if they didn't. That's
> why
> > > I draw cartoons. It's my life." -- Charles Shultz
> > >
> >
> >
> >
> > --
> > Timothy Normand Miller, PhD
> > Assistant Professor of Computer Science, Binghamton University
> > http://www.cs.binghamton.edu/~millerti/
> > Open Graphics Project
>
> --
> --------------------------------------------------------------------------
> Troy Benjegerdes                'da hozer'                 [email protected]
>
> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> software & hardware (http://q3u.be) stuff and not get a real job.
> Charles Shultz had the best answer:
>
> "Why do musicians compose symphonies and poets write poems? They do it
> because life wouldn't have any meaning for them if they didn't. That's why
> I draw cartoons. It's my life." -- Charles Shultz
>



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] 4-stage signed multiplier 35x35

Reply via email to