On Tue, 2003-06-03 at 08:32, Duncan Murdoch wrote: > On Tue, 3 Jun 2003 09:49:44 +0100, you wrote in message > <[EMAIL PROTECTED]>: > > >Duncan > >If the numbers are not represently exactly how does R resolve problems like > >the one below? Is there something that needs to be set up in the R > >environment like the number of significant figures? > > > >> x<-4.145*100+0.5 > >> x > >[1] 415 > >> floor(x) > >[1] 414 > > R doesn't do anything to resolve this problem; it's just the way the > IEEE standard floating point formats work. In Excel 97, 4.145*100+0.5 > is exactly equal to 415; I would guess this is either because they use > a binary coded decimal format instead of the IEEE floating point > types, or they round results internally in some way. R doesn't > support BCD formats, and doesn't do tricky rounding behind your back. > You get what you ask for. > > If you want the calculation above to give you exactly 415, the > standard workaround in languages without BCD formats is to work in > some decimal multiple of the actual numbers you're interested in, e.g. > 10000. Then you would store 4.145 as 41450, multiply by 1000000 (i.e. > 100*10000) and divide by 10000 to give 4145000, and add 5000, to give > 4150000. All of these numbers are exactly representable in double > precision floating point types, because they are all integers with > fewer than 53 bits in their binary representations. > > Doing this means you need to change the definitions of *, /, ^, and > lots of other low level functions, but + and - work in the usual way. > It might be an interesting project to write a package that does all of > this. > > Duncan Murdoch
In Excel, the IEEE standard (754) is used to internally represent floats. A MS-KB article on this is here: http://support.microsoft.com/default.aspx?scid=kb;[LN];214118 Another, more detailed, is here: http://support.microsoft.com/default.aspx?scid=kb;EN-US;78113 What is curious about this situation, and apropos to Prof. Ripley's comments about the difference between internal representation, rounding and displayed values, is the following information. Note how the results of cell calculations differ between Excel, OpenOffice.org Calc and Gnumeric. In each case, I use a format setting of 20 digits after the decimal place with scientific notation. This is best read with a fixed width font. OOo Calc 1.0.2 and 1.1 Beta2: Cell Formula Value = 4.145 * 100 + 0.5 4.15000000000000000000E+02 = 0.5 - 0.4 - 0.1 0.00000000000000000000E+00 =(0.5 - 0.4 - 0.1) 0.00000000000000000000E+00 Excel 2002 (XP): Cell Formula Value = 4.145 * 100 + 0.5 4.15000000000000000000E+02 = 0.5 - 0.4 - 0.1 0.00000000000000000000E+00 =(0.5 - 0.4 - 0.1) -2.77555756156289000000E-17 Gnumeric 1.0.12: Cell Formula Value = 4.145 * 100 + 0.5 +4.14999999999999943157E+02 = 0.5 - 0.4 - 0.1 -2.77555756156289135106E-17 *Gnumeric does not appear to allow the surrounding parens. For comparison, R 1.7.1 Beta under RH 9 and WinXP: > print(4.145 * 100 + 0.5, digits = 20) [1] 414.99999999999994 > formatC(4.145 * 100 + 0.5, format = "E", digits = 20) [1] "4.14999999999999943157E+02" > print(0.5 - 0.4 - 0.1, digits = 20) [1] -2.775557561562891e-17 > formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20) [1] "-2.77555756156289135106E-17" What is interesting is the change in the displayed value in Excel when the second formula is surrounded by parens (which I found purely by accident). This would suggest that there may be something going on in the parsing of the cell formula that affects the calculation and displayed value. Also note the precision of the resultant number. Presuming that each of the spreadsheet programs are using IEEE standard internal representation, there are clearly differences in the way in which each visually displays the values, both by default and when explicitly formatted. Using the following cell formula: = 1.333 + 1.225 - 1.333 - 1.225 there is an indication in the second MS-KB article above, that Excel 97 introduced an "optimization" dealing with results near zero. "The example above when performed in Excel 97 and later correctly displays 0 or 0.000000000000000E+00 in scientific notation." whereas "Rather than displaying 0, Excel 95 displays -2.22044604925031E-16." The terms "optimization" and "correctly displays" are an interesting choice of words. I have a post to one of the OOo forums regarding my inability to replicate the IEEE precision issues in Calc under any circumstances using the three formulas and any numeric formatting options. It may be that the OOo folks copied the MS Excel "optimization" with no override. FYI...the IEEE has a reference site for the standard here: http://grouper.ieee.org/groups/754/ HTH, Marc Schwartz ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
