On Tuesday 08 February 2005 18:36, Daniel Phillips wrote:
> Hi all,
>
> OK, there were three goofs in the above analysis:
>
> 1) I should have used the interval [1, 2), which is the correct
> interval for a normalized floating point mantissa (not [.5, 1])
Well, it's all pretty much moot anyway. If you implement it the interval is
[0, 65535] for the 16-bit mantissa. Maybe I should explain what my float25
class (it's in svn now I understand) is doing right now.
Here is the code that fills the table:
void float25::initrecptables() {
int v = 0, v2 = 0, d = 0, d2 = 0;
recp_table1[0] = (16352 << 4) | 15;
for (int i = 1; i < 1024; ++i) {
v = recp_mantissa_ref((i << 6) + 0x40);
d = recp_mantissa_ref(i << 6) - v;
switch (d & 0x7) {
case 2:
v += 1;
break;
case 3:
v += 2;
break;
case 4:
case 5:
v -= 2;
break;
case 6:
v -= 1;
break;
}
v2 = (v + 2) >> 2;
d2 = ((d + 4) >> 3) - 1;
recp_table1[i] = (v2 << 4) | d2;
}
}
First, I calculate the value at the low (back, it's a descending function) end
of the current interval (v) and the difference with the value at the high
(front) end of the interval (d).
It turns out that d ranges over the interval [32, 128]. Thus, we need eight
bits to represent it raw. Since the actual range has less than 128 numbers
however, we can encode it in seven bits if we apply a bias. Effectively, we
use a bias of -8, so the range becomes [24, 120]. So far so good.
Now v is 16 bits, and d is 7 bits, so we need 23 bits for each entry, but we
only have 18. Since the interpolation depends on both values, it does not pay
to have one of them more precise than the other, so to shorten things we drop
an equal amount of bits from d and v. Chopping off (well, with rounding) the
three least significant bits from each gives us 13 + 4 = 17 bits to put in
the LUT. But now we have one bit to spare.
So, I added that extra bit to the value part, giving a 14 + 4 scheme. Now
comes the tricky bit. The difference, d, encodes the steepness of the
segment, and we drop three bits from it. That is, we calculate d % 8, and if
the result is below 4 we round down, otherwise we round up. If d % 8 is zero,
we don't lose anything by rounding down, but for two and three the difference
becomes lower than it should be. That is, the line is less steep. The theory
behind that switch statement is that we can partially compensate for this by
making the low value a bit larger. That way, the error range will still be
the same, but it will be more even and the maximum error decreases. Or if you
wish, we store part of the information that is lost when we round d in the
extra bit of v. The actual values have been found by trial and error, and
yes, it does in fact help :-).
This is the actual reciprocal function:
float25 float25::approx_recp() const {
unsigned int base, difference, invfrac;
unsigned int mantissa, exponent, sign;
unsigned int t;
float25 ret;
sign = bits & 0x80000000;
exponent = (bits >> 23) & 0xFF;
mantissa = (bits >> 7) & 0xFFFF;
if (mantissa != 0) {
t = recp_table1[mantissa >> 6];
base = ((t >> 2) & ~0x3);
difference = (t & 0xF) + 1;
invfrac = (((mantissa & 0x3F) ^ 0x3F) + 1);
mantissa = base + ((difference * invfrac) >> 3);
}
exponent = ((~exponent) - 2) & 0xFF;
ret.bits = sign | (exponent << 23) | (mantissa << 7);
return ret;
}
We keep the sign, and the new exponent is easy to calculate. For the mantissa,
we read out the table entry, extract v and d (not forgetting the bias), and
then the rest is just a simple linear interpolation. Note that we invert the
fraction (0 becomes 63, 63 becomes 0), since we have the value for the far
end. The problem here was that v overflows: if we stored the near end value
then the first entry would contain 0x10000, and that's one bit too many.
> 2) My decimal math implied 500 linear segments in the interval,
> whereas I meant to have 1000
Doh! I missed that too...
> 3) As I noted earlier, the curvature does in fact vary considerably
> over the interval, affecting the maximum error.
>
> The corrected formulas show that interpolation accuracy is considerably
> better than I'd originally estimated. Viktor Pracht pointed out that
> curvature is expressed by the second derivative (duh) which is 1/x**3,
> so the empirical result that error is eight times worse at one end of
> the function than the other is correct. Fortunately, this is a win for
> x>1.
>
> So, near 1:
> >>> 1/1.0005 - (1/1.001 + 1)/2
>
> -2.4962543709872165e-07
>
> that is, one part in about 4 million or roughly 22 bits of accuracy as
>
> before. And near 2:
> >>> 1/1.9995 - (1/1.999 + .5)/2
>
> -3.1273451162050492e-08
>
> that is, about one part in 32 million, or 25 bits of accuracy. This is
> entirely satisfactory for our purposes.
Yep. I'd already found out empirically that when doing linear interpolation
between 16-bit values I would get 15 bits precision. OTOH, I guess it should
be possible to make it 16...I'll have a look. Or maybe using 18-bit words in
the LUT could improve that to 16 bits. The problem is that you need to read
two values per pixel, and the RAM is dual-ported, so we can't do two pixels
at the same time. Unless we use two RAM blocks, one table for each pixel. How
scarce are these things?
Lourens
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)