[R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
Hello, I have encountered some unexpected behavior in R that seems to occur as a result of having the current year embedded in a number: #Some large numbers, representing IDs. IDs - c(41255689815201100, 41255699815201100, 41255709815201100) #In scientific notation IDs [1] 4.125569e+16 4.125570e+16 4.125571e+16 #Change penalty. options(scipen = 5) #Why does R add 4? IDs [1] 41255689815201104 41255699815201104 41255709815201104 #Changing from numeric to character makes no difference. as.character(IDs) [1] 41255689815201104 41255699815201104 41255709815201104 #What happens if I treat the numbers as characters? IDs.character - c(41255689815201100, 41255699815201100, 41255709815201100) #No change. IDs.character [1] 41255689815201100 41255699815201100 41255709815201100 #R adds 4 upon converting to numeric. as.numeric(IDs.character) [1] 41255689815201104 41255699815201104 41255709815201104 #Is this problem occurring because the current year is embedded in the number? IDs - c(41255689815201100, 41255699815201000, 41255709815201200) #R is no longer adding 4 to the numbers without 2011. IDs [1] 41255689815201104 41255699815201000 41255709815201200 Am I doing something wrong? Any insight on how I can avoid the problem of R changing numbers on its own? Are others able to replicate this example? Is this some kind of bug? Am I right that this problem is occurring because the current year is embedded in the number? I discovered this when trying to merge two data sets, one with IDs stored numbers and one with IDs as characters. I have replicated this in Windows XP with R 2.12 and Windows 7 with R 2.13 (both 32- and 64-bit versions). Thanks, Chris -- Christopher T. Moore, M.P.P. Doctoral Student Quantitative Methods in Education University of Minnesota 44.9785°N, 93.2396°W moor0...@umn.edu http://umn.edu/~moor0554 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
You seem to be running into the limits of double-precision - your IDs have 17 significant digits which is more than the double precision floating point number can hold without any rounding errors. Since you are using these numbers as IDs, simply keep them as character strings throughout your code, and nothing will ever change. Or shorten the IDs by a few digits and your IDs will be safe again. HTH, Peter On Wed, Jun 29, 2011 at 11:29 AM, Christopher T. Moore moor0...@umn.edu wrote: Hello, I have encountered some unexpected behavior in R that seems to occur as a result of having the current year embedded in a number: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
On Jun 29, 2011, at 2:29 PM, Christopher T. Moore wrote: Hello, I have encountered some unexpected behavior in R that seems to occur as a result of having the current year embedded in a number: No. that is not the explanation. #Some large numbers, representing IDs. IDs - c(41255689815201100, 41255699815201100, 41255709815201100) 41255689815201100 2*10^9 [1] TRUE So you may think you are working with integers but youa re in fact working with floating point numbers. See the R-FAQ -- David. #In scientific notation IDs [1] 4.125569e+16 4.125570e+16 4.125571e+16 #Change penalty. options(scipen = 5) #Why does R add 4? IDs [1] 41255689815201104 41255699815201104 41255709815201104 #Changing from numeric to character makes no difference. as.character(IDs) [1] 41255689815201104 41255699815201104 41255709815201104 #What happens if I treat the numbers as characters? IDs.character - c(41255689815201100, 41255699815201100, 41255709815201100) #No change. IDs.character [1] 41255689815201100 41255699815201100 41255709815201100 #R adds 4 upon converting to numeric. as.numeric(IDs.character) [1] 41255689815201104 41255699815201104 41255709815201104 #Is this problem occurring because the current year is embedded in the number? IDs - c(41255689815201100, 41255699815201000, 41255709815201200) #R is no longer adding 4 to the numbers without 2011. IDs [1] 41255689815201104 41255699815201000 41255709815201200 Am I doing something wrong? Any insight on how I can avoid the problem of R changing numbers on its own? Are others able to replicate this example? Is this some kind of bug? Am I right that this problem is occurring because the current year is embedded in the number? I discovered this when trying to merge two data sets, one with IDs stored numbers and one with IDs as characters. I have replicated this in Windows XP with R 2.12 and Windows 7 with R 2.13 (both 32- and 64-bit versions). Thanks, Chris -- Christopher T. Moore, M.P.P. Doctoral Student Quantitative Methods in Education University of Minnesota 44.9785°N, 93.2396°W moor0...@umn.edu http://umn.edu/~moor0554 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.