So here is the epsilon value for various scales of double precision floating 
point numbers (scale is the absolute base10 exponent):

scale=   0 epsilon= 2.220446049250313e-16
scale=   1 epsilon= 1.7763568394002505e-15
scale=   2 epsilon= 1.4210854715202004e-14
scale=   3 epsilon= 1.1368683772161603e-13
scale=   4 epsilon= 1.8189894035458565e-12
scale=   5 epsilon= 1.4551915228366852e-11
scale=   6 epsilon= 1.1641532182693481e-10
scale=   7 epsilon= 1.862645149230957e-09
scale=   8 epsilon= 1.4901161193847656e-08
scale=   9 epsilon= 1.1920928955078125e-07
scale=  10 epsilon= 1.9073486328125e-06
scale=  11 epsilon= 1.52587890625e-05
scale=  12 epsilon= 0.0001220703125
scale=  13 epsilon= 0.001953125
scale=  14 epsilon= 0.015625
scale=  15 epsilon= 0.125
scale=  16 epsilon= 2.0
scale=  17 epsilon= 16.0
scale=  18 epsilon= 128.0
scale=  19 epsilon= 2048.0
scale=  20 epsilon= 16384.0
scale=  21 epsilon= 131072.0
scale=  22 epsilon= 2097152.0
scale=  23 epsilon= 16777216.0
scale=  24 epsilon= 134217728.0
scale=  25 epsilon= 2147483648.0
scale=  26 epsilon= 17179869184.0
scale=  27 epsilon= 137438953472.0
scale=  28 epsilon= 2199023255552.0
scale=  29 epsilon= 17592186044416.0
scale=  30 epsilon= 140737488355328.0
scale=  31 epsilon= 1125899906842624.0

As you can see, the precision is based on the size of number.  So if you are 
doing operations on numbers that are all "relatively close" to each other in 
scale, the precision will be maintained.  However, if you do operations that 
mix scales, the precision of the result may be quite different from what one 
expects.

So, for example, if you are averaging 1000 fp numbers that are each in the 
range of 1000 (and thus contain 13 accurate digits after the decimal point 
maximum), when you have added them up the sum only has 10 significant digits 
(after the decimal point).  When you then divide by 1000, the resulting average 
will only be accurate to a maximum of 10 digits after the decimal point, not 
the 13 that you might expect because of the size of the result.

So, when computing an average that spans lots of values of varying sizes, you 
need to do either a running calculation (which keeps the "average so far", and 
moves that by the impact on the average of each additional value so that as 
there become more and more values the effect becomes less and less), or sort 
the input data by the absolute value of the magnitude and sum smallest 
(absolute value wise) first.

This is a simple example, but I am sure you can imagine the effect on a number 
of sequential operations (mixed addition and multiplications) can get out of 
hand quite quickly.  Modern CPUs will mitigate the effects somewhat by 
performing the actual individual operation in extended floating point (higher 
precision) and then converting the result back to double precision which will 
allow a much larger variability in the scale of the operands without 
significantly affecting the precision of the result.

As to ordering the set, I doubt it makes much difference whether you use a WITH 
subquery to generate the sorted set or do it in a subquery.  Of course, whether 
there is any difference in the result at all depends on the scale spread of the 
numbers you are working with.

select sum(i)/count(i) from (select i from dataset order by abs(i));

may or may not be (significantly) different from the simple

select avg(i) from dataset;

however the former (sorted input) will generate the more precise result, 
especially if there are many "smaller scale" numbers relative to the number of 
"larger scale" numbers (since those may accumulate to a value that is 
representable within the epsilon of the larger value, whereas if you do the sum 
in the other direction, the impact of the smaller scale numbers is just noise 
at the limits of precision).  (So for example if you were doing money, you add 
up the pennies, nickles, dimes, etc first.  They may accumulate to a large 
enough number to affect the sum even if one of the values added to the sum is a 
trillion dollars.  If you do it the other way around, then the effect of adding 
a penny to a trillion dollars is negligible -- though it would still be within 
the precision limits of double precision floating point).


> -----Original Message-----
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jean-Marie CUAZ
> Sent: Wednesday, 12 July, 2017 03:34
> To: sqlite-users@mailinglists.sqlite.org
> Subject: Re: [sqlite] Example/recipe for truncating fp numbers
> 
> Thank you M. Medcalf for your nice explanation.
> 
> In my first post, I gave half of the used solution : when storing a
> "truncated value", eventual remaining digits are allso, separately,
> stored as a whole integer.
> 
> Both parts are reassembled later when needed (i.e. when doing set
> agregation). The intent is to (imperfectly) recover lost precision
> resulting from truncation at elementary level : naîve and ugly, but the
> best I'm able to do.
> 
>  From your explanation, can I conclude that when doing summation on
> large sets of fp numbers, a prior sort on this set gives chances to a
> better accuracy of the result ?  In SQL, if the usage of a separate
> table to be filled in sorted order is not wanted, what would be the
> differences in efficiency/memory usage between a sub-select in the FROM
> clause compared to the same ordered set processed via a WITH clause ?
> 
> Thanks again.
> 
> -jm
> 
> 
> 
> 
> 
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to