[sqlite] Problem with floating point fields, and a feature request

Cariotoglou Mike Tue, 20 Sep 2005 06:58:52 -0700

There is an issue with floating point fields, which exists in every
database I have dealt with, and of course exists in sqlite as well, all
versions.
 Essentially, the issue is this:


"When are two floating point values considered equal ?"

Why is this an issue ?
Floating point values are some times used as part of keys. Even worse,
sql resolvers (engines that construct sql statements based on old/new
values of client data) frequently construst statements like this one:

update table
 set a=<some float>
where b=<some float>

such a statement will result from a user selecting <a>, editing, and
posting an update. Sqlite needs to find the relevant row, (either with
an index or not), which at some point will involve comparing
a float value stored in the database with a user-supplied float value.

Now, depending on how floats are stored, and how they are inputted, this
may or may not be an issue. in sqlite 2, where all is stored as text,
one could choose to store floats with a known precision,
and work around the inherently inaccurate comparison of floats. in
sqlite 3, we have a FLOAT storage class , which means that we now store
floats in binary form (which is good), and also means that
equality is based on comparing floats in a binary way ( all 8 bytes
being equal, in other words). 

This can/will be disastrous in many cases, one such was raised some time
ago by another user (the '9.95' issue). The above update can fail, for
no reason apparent to the user.

It also happens with people working in Delphi, and using native date
formats, which are actually floats. Dates are very often part of keys,
sorting, grouping etc, so all kinds of rounding errors can and will
manifest.

Of course, one can work around this problem, by NOT storing floats in
binary format, but in a well-defined text format. however, this would be
a pity, as a lot of time would be taken by sqlite and user code in
comverting back and forth, plus, in some cases, sqlite will try to
promote values stored as text to binary, re-introducing the problem. not
to mention that, floats stored as text will collate incorrectly, unless
they are right-justified and padded with '0', which increases the
storage requirements. 

This is not new, however, in sqlite we *do* have access to the people
that write the code, so a fix *can* be found, if there is a will and
understanding of the problem :) :)

All possible solutions to this issue have to do with the way that float
equality is established. It boils down to this:

two floats A and B should be compared with this algorithm :

diff=A-B
if (diff>tolerance) then A>B
else if (diff<-tolerance) then A<B
else A = B

in other words, a tolerance value should be used to compare them. What
happens with "standard" float arithmetic is that this tolerance is 0,
which leads to the wrong decision, when small rounding errors creep into
the numbers. The question then is, how does one define the tolerance.

The ideal way would be to define it at the column level, as a column
attribute. however, this would require extensive changes in the sqlite
code.

so, we could define it at the global level (not ideal, but it would do),
by :

1. adding a PRAGMA FLOATING_TOLERANCE=<something>

or

adding a user-defined callback (as you did with the REGEXP operator),
such as 

sqlite_compareFloat(a,b), which defaults to whatever code is now used to
compare two floats, but can be overriden by the user.

the <something> should be in terms of SIGNIFICANT digits, not number of
decimals, due to the way floating point works.

Having a capability like this would be of immence value to the stability
of database applications, and , I suspect, would not require a major
re-work. 

what do you think, DRH, is this doable ?

[sqlite] Problem with floating point fields, and a feature request

Reply via email to