Re: [R] strangely long floating point with write.table()

Mike Miller Sat, 15 Mar 2014 23:15:28 -0700

On Sat, 15 Mar 2014, peter dalgaard wrote:

On 15 Mar 2014, at 20:54 , Mike Miller <mbmille...@gmail.com> wrote:

$ cat data1.txt
0.005
0.00499999999999989

I don't know why it shows 17 digits and doesn't round to 15, but it is showing 
that the numbers are different, for some reason.


Aiding my weakening eyesight a little:

0.004 999 999 999 999 89

Notice that that makes 15 _significant_ digits.

OK, now I feel really stupid. Of course it's 15 mantissa digits, not 15%f digits, or whatever that should be called. Sorry about that.

Do you understand why there is a difference between 1-0.995 and 2-1.995in their internal representations?
Let's see,  that'll be like

1 - 2/3 vs. 10 - 29/3
on a decimal computer if someone is perverse enough to give input inbase 3 (i.e., 1.0 - 0.2 ternary vs. 101.0 - 100.2 ternary). Assume thatthe computer is floating point with 3 significant digits (and possiblytaking some liberties compared to what real computers really do), wehave
  1 = 1.000 * 10^0
 10 = 1.000 * 10^1
2/3 = 0.667 * 10^0
29/3 = 0.967 * 10^1

1 - 2/3  = 0.333 * 10^0
10 - 29/3 = 0.033 * 10^1 = 0.330 * 10^0

So, yes, I think I do understand how these things can happen.

Yes, and that's a nice explanation, but you had me at "_significant_". Idon't know why I didn't get that in the first place. So the difference inmy example is that 0.995 is 9.950e-1 so that the 5 is the thirdsignificant digit and in 1.995, the 5 is the fourth significant digit, so1-0.995 provides a more precise representation of 0.005 than does 2-1.995.

I always knew there was some numerical reason why I was getting very longstretches of 9s or 0s in the write.table() output, but my concern isreally with how to prevent that from happening. So the question still is,how do I avoid getting 0.00499999999999989 in my output file when I want0.005? I'm sure I'm not alone in this. It looks like the standard answeris to use format(). For example, I could do this:

write.table(format(data, digits=13, trim=T), file="data.txt", row.names=F, 
col.names=F, quote=F)

That does fix the long numbers -- all of them are reduced to three digits.The one thing that concerns me is that when format() is called, isn't itmaking an object that could take up a lot of memory if the data frame islarge? The data frame created by format() might use a lot more memorythan the original data frame if it is converting a lot of doubles (8bytes) to a lot of possibly 16-byte strings. For example, -10/81 takes up8 bytes as a double, but converted by format with digits=13 it uses 16bytes to include the sign, the zero and the decimal point (plus adelimiter when there are many per line of output):

write.table(format(-10/81, digits=13), row.names=F, col.names=F, quote=F)

-0.1234567901235

I'm assuming that write.table() is streaming the data into a file (orstdout) and not creating a complete representation of the output in memorybefore it does that. It looks like format() creates a data frame whereall variables are converted to character type. Thus, it wouldn't be justfor convenience that one might want digits=N to be an option in thewrite.table() function. It would be very useful with large data frames,making it possible to write out things that would be too large to handleusing format(). When files are already super-large, we really want toavoid expanding the number of digits per value in the output.


Mike

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strangely long floating point with write.table()

Reply via email to