On Mon, Dec 17, 2018 at 2:50 PM Thomas Kurz <sqlite.2...@t-net.ruhr> wrote:

> Ok, as there seem to be some experts about floating-point numbers here,
> there is one aspect that I never understood:
>
> floats are stored as a fractional part, which is binary encoded, and an
> integer-type exponent. The first leads to the famous rounding errors as
> there is no exact representation of most fractions.
>
> Can someone explain to me why it has been defined this way? Having 1 bit
> sign, 11 bit exponent, and 52 bit mantissa, I would have stored the (in the
> meantime well known) number 211496.26 as 21149626E-2, i.e. I would have
> stored a 52 bit integer number and appropriate exponent. This way there
> should be no rounding errors and one would always have a guaranteed
> precision of ~15 significant digits.
>

To get the maximum precision possible from a binary floating point number,
the designers of the format took advantage of the fact that all numbers
other than zero would have a 1 bit set somewhere in their representation.
To that end, "normal" floating point numbers actually have a 53 bit
mantissa. "But that equals 65 bits! You can't cram 65 bits into a 64 bit
word." But you can if the most significant set bit of the mantissa is
implied just to the left of the explicitly given 52 bits of the mantissa.
The most significant digit of a decimal number can be any value from 1
through 9, so you can't use this same trick to extend the precision of a
decimal floating point number.

In addition to normal numbers, there are subnormal numbers, where the left
most digit is implicitly a 0 bit. The value zero happens to be a subnormal
number with all bits set to zero.

Even without the implicit bit, many / most schemes for encoding decimal
digits in binary lose some portion of the range that is possible with
binary representations, and the IEEE designers wanted the best of both
worlds, range and precision, so they gave up exact decimal representation
in favor of binary.

Your approach of coding is what the decimal type does in the .net platform,
among other examples, but the available range is smaller than IEEE binary
floating point numbers of the same size.

As far as it goes, you can still have rounding errors that propagate with a
decimal scheme such as you suggest. Simply add 1/3 + 1/3 + 1/3 in a decimal
representation.

333333333333333E-15 + 333333333333333E-15 + 333333333333333E-15 =
999999999999999E-15. But it should be 1000000000000000E-15 (or 1E0). It
doesn't matter how many bits of precision you add, you can never do this
type of math exactly with decimal floating point numbers. Any time the
decimal expansion extends beyond the bit length of the available precision,
rounding choices are going to have to be made at some point, and some
calculation will be inexact.

Note: I am spouting from memory, so my apologies if I've gotten any
terminology wrong (such as subnormal vs denormal, so similar other ideas).

SDR
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to