On Mon, Dec 17, 2018 at 2:50 PM Thomas Kurz <sqlite.2...@t-net.ruhr> wrote:
> Ok, as there seem to be some experts about floating-point numbers here, > there is one aspect that I never understood: > > floats are stored as a fractional part, which is binary encoded, and an > integer-type exponent. The first leads to the famous rounding errors as > there is no exact representation of most fractions. > > Can someone explain to me why it has been defined this way? Having 1 bit > sign, 11 bit exponent, and 52 bit mantissa, I would have stored the (in the > meantime well known) number 211496.26 as 21149626E-2, i.e. I would have > stored a 52 bit integer number and appropriate exponent. This way there > should be no rounding errors and one would always have a guaranteed > precision of ~15 significant digits. > To get the maximum precision possible from a binary floating point number, the designers of the format took advantage of the fact that all numbers other than zero would have a 1 bit set somewhere in their representation. To that end, "normal" floating point numbers actually have a 53 bit mantissa. "But that equals 65 bits! You can't cram 65 bits into a 64 bit word." But you can if the most significant set bit of the mantissa is implied just to the left of the explicitly given 52 bits of the mantissa. The most significant digit of a decimal number can be any value from 1 through 9, so you can't use this same trick to extend the precision of a decimal floating point number. In addition to normal numbers, there are subnormal numbers, where the left most digit is implicitly a 0 bit. The value zero happens to be a subnormal number with all bits set to zero. Even without the implicit bit, many / most schemes for encoding decimal digits in binary lose some portion of the range that is possible with binary representations, and the IEEE designers wanted the best of both worlds, range and precision, so they gave up exact decimal representation in favor of binary. Your approach of coding is what the decimal type does in the .net platform, among other examples, but the available range is smaller than IEEE binary floating point numbers of the same size. As far as it goes, you can still have rounding errors that propagate with a decimal scheme such as you suggest. Simply add 1/3 + 1/3 + 1/3 in a decimal representation. 333333333333333E-15 + 333333333333333E-15 + 333333333333333E-15 = 999999999999999E-15. But it should be 1000000000000000E-15 (or 1E0). It doesn't matter how many bits of precision you add, you can never do this type of math exactly with decimal floating point numbers. Any time the decimal expansion extends beyond the bit length of the available precision, rounding choices are going to have to be made at some point, and some calculation will be inexact. Note: I am spouting from memory, so my apologies if I've gotten any terminology wrong (such as subnormal vs denormal, so similar other ideas). SDR _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users