[R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year

2011-06-29 Thread Christopher T. Moore

Hello,

I have encountered some unexpected behavior in R that seems to occur as a 
result of having the current year embedded in a number:





#Some large numbers, representing IDs.
IDs - c(41255689815201100, 41255699815201100, 41255709815201100)

#In scientific notation
IDs 

[1] 4.125569e+16 4.125570e+16 4.125571e+16


#Change penalty.
options(scipen = 5)

#Why does R add 4?
IDs

[1] 41255689815201104 41255699815201104 41255709815201104


#Changing from numeric to character makes no difference.
as.character(IDs)

[1] 41255689815201104 41255699815201104 41255709815201104


#What happens if I treat the numbers as characters?
 IDs.character - c(41255689815201100, 41255699815201100, 
41255709815201100)


#No change.
IDs.character

[1] 41255689815201100 41255699815201100 41255709815201100


#R adds 4 upon converting to numeric.
as.numeric(IDs.character)

[1] 41255689815201104 41255699815201104 41255709815201104


 #Is this problem occurring because the current year is embedded in the 
number?

IDs - c(41255689815201100, 41255699815201000, 41255709815201200)

#R is no longer adding 4 to the numbers without 2011.
IDs

[1] 41255689815201104 41255699815201000 41255709815201200





Am I doing something wrong? Any insight on how I can avoid the problem of R 
changing numbers on its own? Are others able to replicate this example? Is 
this some kind of bug? Am I right that this problem is occurring because 
the current year is embedded in the number? I discovered this when trying 
to merge two data sets, one with IDs stored numbers and one with IDs as 
characters. I have replicated this in Windows XP with R 2.12 and Windows 7 
with R 2.13 (both 32- and 64-bit versions).


Thanks,
Chris

--
Christopher T. Moore, M.P.P.
Doctoral Student
Quantitative Methods in Education
University of Minnesota
44.9785°N, 93.2396°W
moor0...@umn.edu
http://umn.edu/~moor0554

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year

2011-06-29 Thread Peter Langfelder
You seem to be running into the limits of double-precision - your IDs
have 17 significant digits which is more than the double precision
floating point number can hold without any rounding errors.

Since you are using these numbers as IDs, simply keep them as
character strings throughout your code, and nothing will ever change.
Or shorten the IDs by a few digits and your IDs will be safe again.

HTH,

Peter

On Wed, Jun 29, 2011 at 11:29 AM, Christopher T. Moore moor0...@umn.edu wrote:
 Hello,

 I have encountered some unexpected behavior in R that seems to occur as a
 result of having the current year embedded in a number:

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year

2011-06-29 Thread David Winsemius


On Jun 29, 2011, at 2:29 PM, Christopher T. Moore wrote:


Hello,

I have encountered some unexpected behavior in R that seems to occur  
as a result of having the current year embedded in a number:


No. that is not the explanation.





#Some large numbers, representing IDs.
IDs - c(41255689815201100, 41255699815201100, 41255709815201100)


41255689815201100   2*10^9
[1] TRUE

So you may think you are working with integers but youa re in fact  
working with floating point numbers. See the R-FAQ



--
David.

#In scientific notation
IDs

[1] 4.125569e+16 4.125570e+16 4.125571e+16

#Change penalty.
options(scipen = 5)
#Why does R add 4?
IDs

[1] 41255689815201104 41255699815201104 41255709815201104

#Changing from numeric to character makes no difference.
as.character(IDs)

[1] 41255689815201104 41255699815201104 41255709815201104

#What happens if I treat the numbers as characters?
IDs.character - c(41255689815201100, 41255699815201100,  
41255709815201100)

#No change.
IDs.character

[1] 41255689815201100 41255699815201100 41255709815201100

#R adds 4 upon converting to numeric.
as.numeric(IDs.character)

[1] 41255689815201104 41255699815201104 41255709815201104
#Is this problem occurring because the current year is embedded in  
the number?

IDs - c(41255689815201100, 41255699815201000, 41255709815201200)
#R is no longer adding 4 to the numbers without 2011.
IDs

[1] 41255689815201104 41255699815201000 41255709815201200




Am I doing something wrong? Any insight on how I can avoid the  
problem of R changing numbers on its own? Are others able to  
replicate this example? Is this some kind of bug? Am I right that  
this problem is occurring because the current year is embedded in  
the number? I discovered this when trying to merge two data sets,  
one with IDs stored numbers and one with IDs as characters. I have  
replicated this in Windows XP with R 2.12 and Windows 7 with R 2.13  
(both 32- and 64-bit versions).


Thanks,
Chris

--
Christopher T. Moore, M.P.P.
Doctoral Student
Quantitative Methods in Education
University of Minnesota
44.9785°N, 93.2396°W
moor0...@umn.edu
http://umn.edu/~moor0554

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.