So here is the epsilon value for various scales of double precision floating point numbers (scale is the absolute base10 exponent):
scale= 0 epsilon= 2.220446049250313e-16 scale= 1 epsilon= 1.7763568394002505e-15 scale= 2 epsilon= 1.4210854715202004e-14 scale= 3 epsilon= 1.1368683772161603e-13 scale= 4 epsilon= 1.8189894035458565e-12 scale= 5 epsilon= 1.4551915228366852e-11 scale= 6 epsilon= 1.1641532182693481e-10 scale= 7 epsilon= 1.862645149230957e-09 scale= 8 epsilon= 1.4901161193847656e-08 scale= 9 epsilon= 1.1920928955078125e-07 scale= 10 epsilon= 1.9073486328125e-06 scale= 11 epsilon= 1.52587890625e-05 scale= 12 epsilon= 0.0001220703125 scale= 13 epsilon= 0.001953125 scale= 14 epsilon= 0.015625 scale= 15 epsilon= 0.125 scale= 16 epsilon= 2.0 scale= 17 epsilon= 16.0 scale= 18 epsilon= 128.0 scale= 19 epsilon= 2048.0 scale= 20 epsilon= 16384.0 scale= 21 epsilon= 131072.0 scale= 22 epsilon= 2097152.0 scale= 23 epsilon= 16777216.0 scale= 24 epsilon= 134217728.0 scale= 25 epsilon= 2147483648.0 scale= 26 epsilon= 17179869184.0 scale= 27 epsilon= 137438953472.0 scale= 28 epsilon= 2199023255552.0 scale= 29 epsilon= 17592186044416.0 scale= 30 epsilon= 140737488355328.0 scale= 31 epsilon= 1125899906842624.0 As you can see, the precision is based on the size of number. So if you are doing operations on numbers that are all "relatively close" to each other in scale, the precision will be maintained. However, if you do operations that mix scales, the precision of the result may be quite different from what one expects. So, for example, if you are averaging 1000 fp numbers that are each in the range of 1000 (and thus contain 13 accurate digits after the decimal point maximum), when you have added them up the sum only has 10 significant digits (after the decimal point). When you then divide by 1000, the resulting average will only be accurate to a maximum of 10 digits after the decimal point, not the 13 that you might expect because of the size of the result. So, when computing an average that spans lots of values of varying sizes, you need to do either a running calculation (which keeps the "average so far", and moves that by the impact on the average of each additional value so that as there become more and more values the effect becomes less and less), or sort the input data by the absolute value of the magnitude and sum smallest (absolute value wise) first. This is a simple example, but I am sure you can imagine the effect on a number of sequential operations (mixed addition and multiplications) can get out of hand quite quickly. Modern CPUs will mitigate the effects somewhat by performing the actual individual operation in extended floating point (higher precision) and then converting the result back to double precision which will allow a much larger variability in the scale of the operands without significantly affecting the precision of the result. As to ordering the set, I doubt it makes much difference whether you use a WITH subquery to generate the sorted set or do it in a subquery. Of course, whether there is any difference in the result at all depends on the scale spread of the numbers you are working with. select sum(i)/count(i) from (select i from dataset order by abs(i)); may or may not be (significantly) different from the simple select avg(i) from dataset; however the former (sorted input) will generate the more precise result, especially if there are many "smaller scale" numbers relative to the number of "larger scale" numbers (since those may accumulate to a value that is representable within the epsilon of the larger value, whereas if you do the sum in the other direction, the impact of the smaller scale numbers is just noise at the limits of precision). (So for example if you were doing money, you add up the pennies, nickles, dimes, etc first. They may accumulate to a large enough number to affect the sum even if one of the values added to the sum is a trillion dollars. If you do it the other way around, then the effect of adding a penny to a trillion dollars is negligible -- though it would still be within the precision limits of double precision floating point). > -----Original Message----- > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > On Behalf Of Jean-Marie CUAZ > Sent: Wednesday, 12 July, 2017 03:34 > To: sqlite-users@mailinglists.sqlite.org > Subject: Re: [sqlite] Example/recipe for truncating fp numbers > > Thank you M. Medcalf for your nice explanation. > > In my first post, I gave half of the used solution : when storing a > "truncated value", eventual remaining digits are allso, separately, > stored as a whole integer. > > Both parts are reassembled later when needed (i.e. when doing set > agregation). The intent is to (imperfectly) recover lost precision > resulting from truncation at elementary level : naîve and ugly, but the > best I'm able to do. > > From your explanation, can I conclude that when doing summation on > large sets of fp numbers, a prior sort on this set gives chances to a > better accuracy of the result ? In SQL, if the usage of a separate > table to be filled in sorted order is not wanted, what would be the > differences in efficiency/memory usage between a sub-select in the FROM > clause compared to the same ordered set processed via a WITH clause ? > > Thanks again. > > -jm > > > > > > > --- > L'absence de virus dans ce courrier électronique a été vérifiée par le > logiciel antivirus Avast. > https://www.avast.com/antivirus > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users