On Wednesday 09 February 2005 23:23, Lourens Veen wrote: > On Wednesday 09 February 2005 21:19, Daniel Phillips wrote: > > I vote for 18 bits, which would guarantee 17 bit precision. The 17th > > bit would only live for a short while, being fed immediately into the > > perspective correction multipliers. To me, every bit of extra > > precision is gold, so if it's easy to get, get it. > <snip> > > Anyway, these can be special cased, and then we have 17-bit precision > (using two RAM blocks). Maybe I can even get rid of the problem by storing > values in one table and differences in the other. Differences can then be > in 8.10 fixed format, which may give enough precision to prevent the > problem. Or I could do the same thing I'm doing for the 14/4 one-read > version and have 19 bits value and 17 bits difference over the two tables. > That will definitely give 17 bits of precision, and it saves a subtract > operation.
Replying to self due to sudden brainwave: 36 = 18 + 9 + 9. Let's assume we take two RAM blocks. An extra RAM block will improve performance a lot, and there are 40 of them, so I'd say we can afford it. Now, my 14/4 encoding almost gets 13 bits. I haven't tried, but I think I can squeeze an actual 13 bits out of 14/5. If I can do 13 bits precision out of 14/5, then I can probably also do 17 bits precision out of 18/9 (since our earlier theoretical work shows that the limitations of linear interpolation only get into play from 23 bits onwards). That leaves us with 9 bits left over if we read 36 bits per pixel. What I'd like to suggest is an 18/9/9 value/(difference-bias)/(difference*3)>>2 encoding. That looks rather esotheric, so let me explain. We want 17 bits precision, so we need 18 bits values for the start of the span. The difference to the next value (begin of the next span) will be between 128 and 512 for 18-bit values, so we can cram it into 9 bits using a small bias. 512*3 =1536, so we need 11 bits to store difference*3, which we don't have, so we drop the two least significant bits. The LSB is the same as that of difference (so we don't need to store it again), and the second least significant bit can be calculated fairly easily from the two least significant bits of difference. So, our reciprocal routine gets a value v, a difference d, and d*3. Now what it needs to calculate is v + (d * f) where f is the fraction of the input mantissa, which is 16 - 10 = 6 bits. Writing d * f as an addition, it is d*f0 + (d*2)*f1 + (d*4)*f2 + (d*8)*f3 + (d*16)*f4 + (d*32)*f5 with fn the n'th bit of f. Now, we have d, and d*3, and we can easily obtain d*2 as d << 1. Now rewrite the whole thing as d*(2*f1 + f0) + 4*d*(2*f3 + f2) + 16*d*(2*f5 + f4) and note how 2*fn + f(n-1) is in the range [0,3]. Multiplying this by d is just a simple MUX selection of 0, d, d << 1 or d*3. Multiplying by 4 or 16 is a fixed shift, which is free. That means that we would be able to do the multiplication with two adders and three MUXes. Then we need to add the result to the value which is another adder, and we need a simplified adder (for the bias, with a fixed second parameter) and a few extra gates to get the values out of the table. A 17-bit reciprocal using two RAM blocks, four adders, three MUXes and a few loose gates doesn't sound too bad. And it won't block any multipliers either. I'm going to try it out tomorrow. I'm fairly sure I can get 16 bits (and maybe even drop the bias and thus an adder in the process), but not so sure I can get 17 bits. On the other hand, if we only put 16 bits in it, won't the extra bit in the output be meaningless anyway? Lourens _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
