On 5/7/09, Andre Pouliot <[email protected]> wrote: > > > 2009/5/5 Timothy Normand Miller <[email protected]> > > Here's a first crack at an important part of the rasterizer. There > > are something on the order of 20 different numbers that have to be > > incremented every cycle. We're rasterizing triangles here, so for > > instance, we have X1, which is the left edge, and X2, which is the > > right edge. They also have increments dX1 and dX2 that are added to > > X1 and X2 to advance their values for one scanline to the appropriate > > values for the next. Almost all of these are 32-bit floats. > > Weren't we using 25 bits float internally?
Yeah, but there were concerns about it being not sufficient. The reason to do 25 is because the multipliers had to be 17-bit. But the rasterizer is just sums. We'll drop bits off further down the pipeline. > > > > > > > Now, we don't want to be doing full FP adds every cycle. That's too > > expensive, and totally unnecessary. Except for unusual circumstances, > > the exponent would usually not change, and when it does, it's by one. > > So what we can do is pre-process the floats coming in from the host, > > pre-aligning the base (X1) and increment (dX1dY) so they are > > denormalized and have the same exponent. This process can be fully > > pipelined and be transparent to the host. The preprocessor would hold > > the original X1 and dX1dY values, and whenever either is updated, the > > shifts are processed, and the aligned working values are forwarded to > > the actual rasterizer. The nice thing here is that this alignment > > logic can be shared among all base/increment pairs, thereby cutting > > out a lot of logic necessary to do normalized floats. Yet we > > sacrifice no precision. > > By doing the preprocessing first and keeping only a partiall value of the dx > or Dy who is prealigned in the mantisse we loose precision especially if we > do a lot of successive operation. We keep the original values and re-align every time either one of them is overwritten by the host. > > > > > > > > > > Below is a first stab at the logic that would do this math. It is > > able to handle sign changes (can only happen when the delta and base > > are opposite signs). It also will only shift right. > > > > I'm pretty sure there's little point in shifting right. This can > > happen when the base is large, and the delta is a small negative. > > Where the left-shifting would come in handy would be when the Y > > rasterizer forwards results to the X rasterizer. But really, there > > would also be a preprocessor for the X rasterizer that would align X1 > > with dX1dX. It's pointless to shift left in the X rasterizer, because > > we wouldn't want to try to recover any precision that was lost in the > > delta when it was shifted right in preprocessing. > > > > What follows is a first stab at this component. Following that, more > > discussion about the the next version. > > > > > > > > > > > > // This module takes two pre-shifted, denormalized, and aligned floating > point > > // numbers and produces a sum. The sum may be shifted by one, and the > > // increment may also get shifted. > > // See > http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/fp-extras.html > > // for info on guard bits. We expect three guard bits, the right-most of > which > > // is the inclusive OR of all bits shifted off the right. This gives us > > // 23+1+3=27 mantissa bits > > // This block may shift right when the base gets too big, but it will > never > > // shift left, because there's no point. > > // Every cycle, the outputs should be registered and fed back in as > inputs. > > module raster_sum( > > // In-coming addends. 'a' is the base that gets incremented. > > // 'b' is the delta. > > input [7:0] exp_in, // Starting exponent (for both numbers) > > input sign_a_in, > > input sign_b_in, > > input [26:0] mantissa_a_in, > > input [26:0] mantissa_b_in, > > > > // Outputs > > // The sign of 'b' never changes, so we don't output that. Anything > else > > // can change. > > output reg [7:0] exp_out, > > output reg sign_a_out, > > output reg [26:0] mantissa_a_out, > > output reg [26:0] mantissa_b_out); > > > > wire [28:0] sum1; > > addsub ad(.a({2'b0, mantissa_a_in}), > > .b({2'b0, mantissa_b_in}), > > .subtract(sign_a_in ^ sign_b_in), > > .c(sum)); > > Should be ".c(sum1));" Thanks! > > > > > > > > wire [27:0] inverse = -sum1[27:0]; > > wire [7:0] next_exp = exp_in + 1; > > > > always @() begin > > case (sum1[28:27]) > > 0: begin // No shift, no sign change > > exp_out = exp_in; > > sign_a_out = sign_a_in; > > mantissa_a_out = sum1[26:0]; > > mantissa_b_out = mantissa_b_in; > > end > > 1: begin // Overflow, shift right > > exp_out = next_exp; > > sign_a_out = sign_a_in; > > mantissa_a_out = sum1[27:1]; > > mantissa_b_out = mantissa_b_in[26:1]; > > end > > 2: begin // Sign change and overflow, shift right > > exp_out = next_exp; > > sign_a_out = !sign_a_in; > > mantissa_a_out = inverse[27:1]; > > mantissa_b_out = mantissa_b_in[26:1]; > > end > > 3: begin // Sign change, no overflow > > exp_out = exp_in; > > sign_a_out = !sign_a_in; > > mantissa_a_out = inverse[26:0]; > > mantissa_b_out = mantissa_b_in[26:0]; > > end > > endcase > > end > > > > endmodule > > > > > > > > This logic isn't going to be fast enough. We'd be lucky if the 29-bit > > addsub could his 100MHz in the S3. Normally, we'd pipeline this, but > > we need to increment every cycle, and one stage of pipelining would > > cause results to be produced once every other cycle. However, there > > are some tricks we can play. We want this to USUALLY produce a result > > every cycle, but it's okay for it to skip a cycle now and then, as > > long as it's not too often. The shifts are relative cheap, but that > > sign change isn't. So I think the next step is to include the holding > > registers in the module, and have the module produce a "valid" bit for > > the output. On those occasions when a shift or sign change has to > > happen, cycle is inserted where there's no valid output. Then all we > > have to worry about is synchronizing all 20 of these units, because > > they'll go invalid at different times. I have some ideas for that > > too. > > > > To do a pipeline we could always interleave 2 or 3 raster on the same unit. > The control logic would cost a little bit more. But it would still be > relatively small and if it help with the speed we could just not care to > much about it. For the negate value do we really need to care for that? The > color value can't go negative, the rest of the data I don't know, maybe the > XY value? > > > > > There's another option worth discussing. Let these sums take two > > cycles. What you need is the parameter (X1), the parameter advanced > > by one step (X1+dX1dY), and two times the delta (2*dX1dY). On > > alternating cycles, you pass X1 and X1_next down the pipeline, thereby > > producing the right sequence of outputs, also updating each of those > > counters every other cycle. In fact, that might be the better option. > > Discuss! > > > > That's another possibility, but to calculate the curent and the next value > it would require that we precompute the (2*dX1dY). If we interleave we could > interleave the X1+dXdY and the Y1+dXdY or X1+dXdY and the other line > X1+dXdY1. Precomputing the 2x is as simple as adding 1 to the exponent, and we can fully pipeline that first sum too. It's only when there's a feedback loop do have the problem I'm trying to deal with here, preventing us from fully pipelining. > > > > > > -- > > Timothy Normand Miller > > http://www.cse.ohio-state.edu/~millerti > > Open Graphics Project > > _______________________________________________ > > Open-graphics mailing list > > [email protected] > > http://lists.duskglow.com/mailman/listinfo/open-graphics > > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > > > > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
