Re: [Open-graphics] Essential component of the rasterizer, pre-shifted sums

Timothy Normand Miller Thu, 07 May 2009 19:43:28 -0700

On 5/7/09, Andre Pouliot <[email protected]> wrote:
>
>
> 2009/5/5 Timothy Normand Miller <[email protected]>
> > Here's a first crack at an important part of the rasterizer.  There
> > are something on the order of 20 different numbers that have to be
> > incremented every cycle.  We're rasterizing triangles here, so for
> > instance, we have X1, which is the left edge, and X2, which is the
> > right edge.  They also have increments dX1 and dX2 that are added to
> > X1 and X2 to advance their values for one scanline to the appropriate
> > values for the next.  Almost all of these are 32-bit floats.
>
> Weren't we using 25 bits float internally?


Yeah, but there were concerns about it being not sufficient.  The
reason to do 25 is because the multipliers had to be 17-bit.  But the
rasterizer is just sums.  We'll drop bits off further down the
pipeline.

>
> >
> >
> > Now, we don't want to be doing full FP adds every cycle.  That's too
> > expensive, and totally unnecessary.  Except for unusual circumstances,
> > the exponent would usually not change, and when it does, it's by one.
> > So what we can do is pre-process the floats coming in from the host,
> > pre-aligning the base (X1) and increment (dX1dY) so they are
> > denormalized and have the same exponent.  This process can be fully
> > pipelined and be transparent to the host.  The preprocessor would hold
> > the original X1 and dX1dY values, and whenever either is updated, the
> > shifts are processed, and the aligned working values are forwarded to
> > the actual rasterizer.  The nice thing here is that this alignment
> > logic can be shared among all base/increment pairs, thereby cutting
> > out a lot of logic necessary to do normalized floats.  Yet we
> > sacrifice no precision.
>
> By doing the preprocessing first and keeping only a partiall value of the dx
> or Dy who is prealigned in the mantisse we loose precision especially if we
> do a lot of successive operation.

We keep the original values and re-align every time either one of them
is overwritten by the host.

>
>
> >
> >
> >
> > Below is a first stab at the logic that would do this math.  It is
> > able to handle sign changes (can only happen when the delta and base
> > are opposite signs).  It also will only shift right.
> >
> > I'm pretty sure there's little point in shifting right.  This can
> > happen when the base is large, and the delta is a small negative.
> > Where the left-shifting would come in handy would be when the Y
> > rasterizer forwards results to the X rasterizer.  But really, there
> > would also be a preprocessor for the X rasterizer that would align X1
> > with dX1dX.  It's pointless to shift left in the X rasterizer, because
> > we wouldn't want to try to recover any precision that was lost in the
> > delta when it was shifted right in preprocessing.
> >
> > What follows is a first stab at this component.  Following that, more
> > discussion about the the next version.
> >
> >
> >
> >
> >
> > // This module takes two pre-shifted, denormalized, and aligned floating
> point
> > // numbers and produces a sum.  The sum may be shifted by one, and the
> > // increment may also get shifted.
> > // See
> http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/fp-extras.html
> > // for info on guard bits.  We expect three guard bits, the right-most of
> which
> > // is the inclusive OR of all bits shifted off the right.  This gives us
> > // 23+1+3=27 mantissa bits
> > // This block may shift right when the base gets too big, but it will
> never
> > // shift left, because there's no point.
> > // Every cycle, the outputs should be registered and fed back in as
> inputs.
> > module raster_sum(
> >    // In-coming addends.  'a' is the base that gets incremented.
> >    // 'b' is the delta.
> >    input [7:0] exp_in,     // Starting exponent (for both numbers)
> >    input sign_a_in,
> >    input sign_b_in,
> >    input [26:0] mantissa_a_in,
> >    input [26:0] mantissa_b_in,
> >
> >    // Outputs
> >    // The sign of 'b' never changes, so we don't output that.  Anything
> else
> >    // can change.
> >    output reg [7:0] exp_out,
> >    output reg sign_a_out,
> >    output reg [26:0] mantissa_a_out,
> >    output reg [26:0] mantissa_b_out);
> >
> > wire [28:0] sum1;
> > addsub ad(.a({2'b0, mantissa_a_in}),
> >          .b({2'b0, mantissa_b_in}),
> >          .subtract(sign_a_in ^ sign_b_in),
> >          .c(sum));
>
> Should be ".c(sum1));"

Thanks!

> >
> >
> >
> > wire [27:0] inverse = -sum1[27:0];
> > wire [7:0] next_exp = exp_in + 1;
> >
> > always @() begin
> >    case (sum1[28:27])
> >        0: begin  // No shift, no sign change
> >            exp_out = exp_in;
> >            sign_a_out = sign_a_in;
> >            mantissa_a_out = sum1[26:0];
> >            mantissa_b_out = mantissa_b_in;
> >        end
> >        1: begin  // Overflow, shift right
> >            exp_out = next_exp;
> >            sign_a_out = sign_a_in;
> >            mantissa_a_out = sum1[27:1];
> >            mantissa_b_out = mantissa_b_in[26:1];
> >        end
> >        2: begin  // Sign change and overflow, shift right
> >            exp_out = next_exp;
> >            sign_a_out = !sign_a_in;
> >            mantissa_a_out = inverse[27:1];
> >            mantissa_b_out = mantissa_b_in[26:1];
> >        end
> >        3: begin  // Sign change, no overflow
> >            exp_out = exp_in;
> >            sign_a_out = !sign_a_in;
> >            mantissa_a_out = inverse[26:0];
> >            mantissa_b_out = mantissa_b_in[26:0];
> >        end
> >    endcase
> > end
> >
> > endmodule
> >
> >
> >
> > This logic isn't going to be fast enough.  We'd be lucky if the 29-bit
> > addsub could his 100MHz in the S3.  Normally, we'd pipeline this, but
> > we need to increment every cycle, and one stage of pipelining would
> > cause results to be produced once every other cycle.  However, there
> > are some tricks we can play.  We want this to USUALLY produce a result
> > every cycle, but it's okay for it to skip a cycle now and then, as
> > long as it's not too often.  The shifts are relative cheap, but that
> > sign change isn't.  So I think the next step is to include the holding
> > registers in the module, and have the module produce a "valid" bit for
> > the output.  On those occasions when a shift or sign change has to
> > happen, cycle is inserted where there's no valid output.  Then all we
> > have to worry about is synchronizing all 20 of these units, because
> > they'll go invalid at different times.  I have some ideas for that
> > too.
> >
>
> To do a pipeline we could always interleave 2 or 3 raster on the same unit.
> The control logic would cost a little bit more. But it would still be
> relatively small and if it help with the speed we could just not care to
> much about it. For the negate value do we really need to care for that? The
> color value can't go negative, the rest of the data I don't know, maybe the
> XY value?
>
> >
> > There's another option worth discussing.  Let these sums take two
> > cycles.  What you need is the parameter (X1), the parameter advanced
> > by one step (X1+dX1dY), and two times the delta (2*dX1dY).  On
> > alternating cycles, you pass X1 and X1_next down the pipeline, thereby
> > producing the right sequence of outputs, also updating each of those
> > counters every other cycle.  In fact, that might be the better option.
> >  Discuss!
> >
>
> That's another possibility, but to calculate the curent and the next value
> it would require that we precompute the (2*dX1dY). If we interleave we could
> interleave the X1+dXdY and the Y1+dXdY or  X1+dXdY and the other line
> X1+dXdY1.

Precomputing the 2x is as simple as adding 1 to the exponent, and we
can fully pipeline that first sum too.  It's only when there's a
feedback loop do have the problem I'm trying to deal with here,
preventing us from fully pipelining.


> >
> >
> > --
> > Timothy Normand Miller
> > http://www.cse.ohio-state.edu/~millerti
> > Open Graphics Project
> > _______________________________________________
> > Open-graphics mailing list
> > [email protected]
> > http://lists.duskglow.com/mailman/listinfo/open-graphics
> > List service provided by Duskglow Consulting, LLC (www.duskglow.com)
> >
>
>


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Essential component of the rasterizer, pre-shifted sums

Reply via email to