Hi Lourens,

On Friday 11 February 2005 12:08, Lourens Veen wrote:
> I'm a bit worried about the double precision floating point divide in your
> inner loop...

Since 'precision' is a power of two, it is just a shift and even the shift 
will disappear when converted to logic.  This is merely the int to float 
conversion.

> Also, why is frac_bits 2? If you have a 16-bit input, and a 10-bit table
> index, then surely you have a 6-bit fraction for interpolation?

I print out the error for every possible input, so if I allow too many 
inputs it takes too long to run.  The number of fact_bits mainly affects 
the number of distinct inputs and affects the error very little.  If you 
set it to 6 you will get essentially the same results.

> I'm not sure I quite understand the code.

Sorry!  I thought it was too short to need any explaination.  (Just joking.)

My less precise interpolation is done with the expression:

   sample - ((j*delta) >> frac_bits)

I subtract the ratio rather than add because I want to use an unsigned 
multiply.  I know the function decreases monotonically, so I obtain delta 
by reverse subtraction to give a positive input to the multiply.  This 
simple form gives an error range of [-3.809740e-06, 2.898030e-06] which is 
plus or minus a little less than 1/2**18.  That's 17 bits of accuracy, but 
I'm greedy so I wanted to see if I could get 18.

I got part of the way there with yesterday's code.  Today's code (attached) 
does the rest of the job.  Instead of throwing away part of the product, I 
shift the sample point before adding, thus gaining frac_bits of extra 
precision, just enough to hold the exact product.  This practically 
eliminates the interpolation error, leaving only the sample error, which we 
can't do much about.  The error is now [-3.809740e-06, 6.415877e-08] which 
is mostly negative due to sample quantization, with a tiny positive error 
due to interpolation.  So I round by adding precision/2 (expressed as an 
integer in terms of frac_bits) giving:

   (sample << frac_bits) - j*delta  + (1 << (frac_bits - 1));

This shifts the improved error range to [-1.902391e-06, 1.971507e-06] or 
just a tiny bit more than 1/2**19.  So I got my full 18 bits of precision, 
less a little sliver that isn't worth worrying about.

Just to be sure, I wrote:

   if ((int)(error*precision))
       goto quit;

which aborts if any result shows error in the 18th bit.  My less precise 
interpolation passes this test too, but only by cheating: both positive and 
negative error are truncated towards zero, so it will survive an error 
range of plus or minus 2**-18, which is only 17 bit precision.

With a little more tweaking, it is no doubt possible to eliminate most of 
the logic needed for the extra frac_bits of precision in the sum.  Maybe 
the logic synthesizer will take care of that automatically.  I'm quite 
satisfied with the results as they are.  As I mentioned yesterday, if 
there's actually a use for 19 bits of precision, we could get it by not 
storing the most significant bit of the sample, which is always 1.

Regards,

Daniel
#include <stdio.h>
#include <math.h>
#define table_size 1024
#define sample_bits 18
#define frac_bits 2

int main(void)
{
    double neg_error = 0, pos_error = 0;
    int table[table_size + 1];
    int precision = (1 << sample_bits);
    int i, j;

    for (i = 0; i <= table_size; i++)
        table[i] = (1 << sample_bits)/(1 + ((float)i / table_size));

    for (i = 0; i < table_size; i++) {
        int sample = table[i], delta = sample - table[i + 1];
        printf("%i: [0x%x] error: ", i, sample);

        for (j = 0; j < 1 << frac_bits; j++) {
#if 1
            // more precise: keep extra interpolation bits around
            int interp = (sample << frac_bits) - ((j * (long long)delta));
            double approx = (double)interp / (precision << frac_bits);
#else
            // less precise: truncate interp to same length as sample
            int interp = sample - ((j * (long long)delta) >> frac_bits);
            double approx = (double)interp / precision;
#endif
            double exact = 1/((double)((i << frac_bits) + j) /
                (double)(table_size << frac_bits) + 1);
            double error = approx - exact;

            printf("%.2e ", error);
            if (error > pos_error)
                pos_error = error;
            if (error < neg_error)
                neg_error = error;
            if ((int)(error*precision))
                goto quit;
        }
        printf("\n");
    }
quit:
    printf("max error = [%e, %e]\n", neg_error, pos_error);
    return 0;
}
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to