Re: [Rd] Randomness not due to seed
On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms jeroen.o...@stat.ucla.edu wrote: I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1 - lm(dist~speed, data=cars); mylm2 - lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm - function(){ return(lm(dist~speed, data=cars)); } mylm1 - makelm(); mylm2 - makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? William Dunlap was correct. Observe in the sequence of comparisons below, the difference in the terms object is causing the identical to fail: Everything else associated with this model--the coefficients, the r-square, cov matrix, etc, exactly match. mylm1 - lm(dist~speed, data=cars); mylm2 - lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE [1] TRUE makelm - function(){ +return(lm(dist~speed, data=cars)); + } mylm1 - makelm(); mylm2 - makelm(); identical(mylm1, mylm2); #FALSE [1] FALSE identical(coef(mylm1), coef(mylm2)) [1] TRUE identical(summary(mylm1), summary(mylm2)) [1] FALSE identical(coef(summary(mylm1)), coef(summary(mylm2))) [1] TRUE all.equal(mylm1, mylm2) [1] TRUE identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared) [1] TRUE identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared) [1] TRUE identical(summary(mylm1)$sigma, summary(mylm2)$sigma) [1] TRUE identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic) [1] TRUE identical(summary(mylm1)$residuals, summary(mylm2)$residuals) [1] TRUE identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled) [1] TRUE identical(summary(mylm1)$call, summary(mylm2)$call) [1] TRUE identical(summary(mylm1)$terms, summary(mylm2)$terms) [1] FALSE summary(mylm2)$terms dist ~ speed attr(,variables) list(dist, speed) attr(,factors) speed dist 0 speed 1 attr(,term.labels) [1] speed attr(,order) [1] 1 attr(,intercept) [1] 1 attr(,response) [1] 1 attr(,.Environment) environment: 0x1b76ae0 attr(,predvars) list(dist, speed) attr(,dataClasses) dist speed numeric numeric summary(mylm1)$terms dist ~ speed attr(,variables) list(dist, speed) attr(,factors) speed dist 0 speed 1 attr(,term.labels) [1] speed attr(,order) [1] 1 attr(,intercept) [1] 1 attr(,response) [1] 1 attr(,.Environment) environment: 0x1cf06b8 attr(,predvars) list(dist, speed) attr(,dataClasses) dist speed numeric numeric -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
Date: Tue, 19 Jul 2011 06:13:01 -0700 From: jeroen.o...@stat.ucla.edu To: r-devel@r-project.org Subject: [Rd] Randomness not due to seed I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1 - lm(dist~speed, data=cars); mylm2 - lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm - function(){ return(lm(dist~speed, data=cars)); } mylm1 - makelm(); mylm2 - makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? I don't know if anyone had a specific answer for this but in general floating point is not something for which you want to make bitwise equality tests. You can check the Intel website for some references but IIRC the FPU can start your calculation with bits or settings ( flushing denorms to zero for example) left over from the last user although I can't document that. for example, you can probably find more like this suggesting that changes in alignmnet and rounding in preamble code can be significant, http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ and of course if your algorithm is numerically sensitive results could change a lot. Now its also possible you have unitiliazed or corrupt memory, but you would need to consider that you will not get bit wise reproduvibility. You can of course go to java if you really want that LOL. -- View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
On 11-07-19 8:01 PM, Mike Marchywka wrote: Date: Tue, 19 Jul 2011 06:13:01 -0700 From: jeroen.o...@stat.ucla.edu To: r-devel@r-project.org Subject: [Rd] Randomness not due to seed I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1- lm(dist~speed, data=cars); mylm2- lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm- function(){ return(lm(dist~speed, data=cars)); } mylm1- makelm(); mylm2- makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? I don't know if anyone had a specific answer for this I think Bill Dunlap's answer addressed it: the claim appears to be false. Duncan Murdoch but in general floating point is not something for which you want to make bitwise equality tests. You can check the Intel website for some references but IIRC the FPU can start your calculation with bits or settings ( flushing denorms to zero for example) left over from the last user although I can't document that. for example, you can probably find more like this suggesting that changes in alignmnet and rounding in preamble code can be significant, http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ and of course if your algorithm is numerically sensitive results could change a lot. Now its also possible you have unitiliazed or corrupt memory, but you would need to consider that you will not get bit wise reproduvibility. You can of course go to java if you really want that LOL. -- View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
I would guess the error below is because of Java messing around in the hardware. It's pretty common on Windows for DLLs to attempt to change the precision setting on the floating point processor; I hadn't seen that before on Linux, but that would be my guess as to the cause. It's also possible that one of the attached packages has messed with R functions somehow, e.g. by replacing the default print() or show() method. A third possibility is that different math libraries are being used. So I would consider the differences in the results to be a bit of a bug, but not one that is likely under our control, and not one that is so large that I would worry about working around it. Duncan Murdoch On 20/07/2011 8:03 AM, Jeroen Ooms wrote: I think Bill Dunlap's answer addressed it: the claim appears to be false. Here is another example where there is randomness that is not due to the seed. On the same machine, the same R binary, but through another interface. First directly in the shell: sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept) speed -17.579094890510951643137 3.932408759124087715975 # And this is through eclipse (java) sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rj_0.5.2-1 loaded via a namespace (and not attached): [1] rJava_0.9-1 tools_2.13.1 set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept)speed -17.57909489051087703615 3.93240875912408460735 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
Hi, Even using the same math libraries you can get different results, depending on what sorts of instructions those libraries use, see the following (none R related) blog article: http://blog.nag.com/2011/02/wandering-precision.html. Martyn -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch Sent: 20 July 2011 14:47 To: Jeroen Ooms Cc: r-devel@r-project.org Subject: Re: [Rd] Randomness not due to seed I would guess the error below is because of Java messing around in the hardware. It's pretty common on Windows for DLLs to attempt to change the precision setting on the floating point processor; I hadn't seen that before on Linux, but that would be my guess as to the cause. It's also possible that one of the attached packages has messed with R functions somehow, e.g. by replacing the default print() or show() method. A third possibility is that different math libraries are being used. So I would consider the differences in the results to be a bit of a bug, but not one that is likely under our control, and not one that is so large that I would worry about working around it. Duncan Murdoch On 20/07/2011 8:03 AM, Jeroen Ooms wrote: I think Bill Dunlap's answer addressed it: the claim appears to be false. Here is another example where there is randomness that is not due to the seed. On the same machine, the same R binary, but through another interface. First directly in the shell: sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept) speed -17.579094890510951643137 3.932408759124087715975 # And this is through eclipse (java) sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rj_0.5.2-1 loaded via a namespace (and not attached): [1] rJava_0.9-1 tools_2.13.1 set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept)speed -17.57909489051087703615 3.93240875912408460735 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
On 20/07/2011 9:59 AM, Martyn Byng wrote: Hi, Even using the same math libraries you can get different results, depending on what sorts of instructions those libraries use, see the following (none R related) blog article: http://blog.nag.com/2011/02/wandering-precision.html. That's another cause that I hadn't considered, also mostly out of our control. (Whoever compiles R does have some control over what optimizations the compiler does, but they might not be aware of them all.) Duncan Murdoch Martyn -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch Sent: 20 July 2011 14:47 To: Jeroen Ooms Cc: r-devel@r-project.org Subject: Re: [Rd] Randomness not due to seed I would guess the error below is because of Java messing around in the hardware. It's pretty common on Windows for DLLs to attempt to change the precision setting on the floating point processor; I hadn't seen that before on Linux, but that would be my guess as to the cause. It's also possible that one of the attached packages has messed with R functions somehow, e.g. by replacing the default print() or show() method. A third possibility is that different math libraries are being used. So I would consider the differences in the results to be a bit of a bug, but not one that is likely under our control, and not one that is so large that I would worry about working around it. Duncan Murdoch On 20/07/2011 8:03 AM, Jeroen Ooms wrote: I think Bill Dunlap's answer addressed it: the claim appears to be false. Here is another example where there is randomness that is not due to the seed. On the same machine, the same R binary, but through another interface. First directly in the shell: sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept) speed -17.579094890510951643137 3.932408759124087715975 # And this is through eclipse (java) sessionInfo() R version 2.13.1 (2011-07-08) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rj_0.5.2-1 loaded via a namespace (and not attached): [1] rJava_0.9-1 tools_2.13.1 set.seed(123) print(coef(lm(dist~speed, data=cars)),digits=22) (Intercept)speed -17.57909489051087703615 3.93240875912408460735 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel This e-mail has been scanned for all viruses by Star. The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
It does not look like your calculation is using the random number generator, so the other responses are probably more to the point. However, beware that setting the seed is not enough to guarantee the same random numbers. You need to also make sure you are using the same uniform RNG and any other generators you use, such as the normal generator. R has a large selection of possibilities. Your start up settings could change the default behaviour. Also, relying on the default will be a bit risky if you are interested in reproducible calculations, because the R default could change in the future (as it has in the past, and as has the Splus generator in the past). If the RNG is important for your reproducible calculations then you might want to look at the examples and tests in the setRNG package. Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r- project.org] On Behalf Of jeroen00ms Sent: July 19, 2011 9:13 AM To: r-devel@r-project.org Subject: [Rd] Randomness not due to seed I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1 - lm(dist~speed, data=cars); mylm2 - lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm - function(){ return(lm(dist~speed, data=cars)); } mylm1 - makelm(); mylm2 - makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? -- View this message in context: http://r.789695.n4.nabble.com/Randomness- not-due-to-seed-tp3678082p3678082.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote: On 20 July 2011 at 14:03, Jeroen Ooms wrote: | I think Bill Dunlap's answer addressed it: the claim appears to be false. | | Here is another example where there is randomness that is not due to | the seed. On the same machine, the same R binary, but through another | interface. First directly in the shell: | | sessionInfo() | R version 2.13.1 (2011-07-08) | Platform: i686-pc-linux-gnu (32-bit) | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | set.seed(123) | print(coef(lm(dist~speed, data=cars)),digits=22) | (Intercept) speed | -17.579094890510951643137 3.932408759124087715975 That's PBKAC --- even double precision does NOT get you 22 digits precision. Hmm, yes, but you would expect the SAME function on the SAME data to yield the same floating point number, and give the SAME printout on the SAME R on the SAME hardware... FWIW all the Mac versions that I can access give the same results as the eclipse version. Let's look at the numbers side-by-side -17.579094890510951643137 3.932408759124087715975 -17.579094890510877036153.93240875912408460735 ! ! 12.345678901234567890123 1.234567890123456789012 so we're seeing differences around the 15th/16th significant digit. This is consistent with a difference of about one unit of least precision in the actual objects, but there could conceivably be other explanations, e.g. the print() function picking up random garbage. Jeroen: Could you save() the results from the two cases, load() them in a new session and compute the difference? You may want to read up on 'what every computer scientist should know about floating point arithmetic' by Goldberg (which is both a true internet classic) and ponder why a common setting for the various 'epsilon' settings of general convergence is set to of the constants supplied by the OS and/or its C library. R has #define SINGLE_EPS FLT_EPSILON [...] #define DOUBLE_EPS DBL_EPSILON in Constants.h. You can then chase the definition of FLT_EPSILON and DBL_EPSILON through your system headers (which is a good exercise). One place you may end up in the manual -- the following from the GNU libc documentationon :Floating Point Parameters FLT_EPSILON This is the minimum positive floating point number of type float such that 1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5. DBL_EPSILON LDBL_EPSILON These are similar to FLT_EPSILON, but for the data types double and long double, respectively. The type of the macro's value is the same as the type it describes. The values are not supposed to be greater than 1E-9. So there -- nine digits. Dirk | # And this is through eclipse (java) | | sessionInfo() | R version 2.13.1 (2011-07-08) | Platform: i686-pc-linux-gnu (32-bit) | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 | [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 | [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] rj_0.5.2-1 | | loaded via a namespace (and not attached): | [1] rJava_0.9-1 tools_2.13.1 | | set.seed(123) | print(coef(lm(dist~speed, data=cars)),digits=22) | (Intercept)speed | | | __ | R-devel@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- Gauss once played himself in a zero-sum game and won $50. -- #11 at http://www.gaussfacts.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Randomness not due to seed
On 20 July 2011 at 18:02, peter dalgaard wrote: | | On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote: | | | On 20 July 2011 at 14:03, Jeroen Ooms wrote: | | I think Bill Dunlap's answer addressed it: the claim appears to be false. | | | | Here is another example where there is randomness that is not due to | | the seed. On the same machine, the same R binary, but through another | | interface. First directly in the shell: | | | | sessionInfo() | | R version 2.13.1 (2011-07-08) | | Platform: i686-pc-linux-gnu (32-bit) | | | | locale: | | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | | [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 | | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | | [9] LC_ADDRESS=C LC_TELEPHONE=C | | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | | | attached base packages: | | [1] stats graphics grDevices utils datasets methods base | | | | set.seed(123) | | print(coef(lm(dist~speed, data=cars)),digits=22) | | (Intercept) speed | | -17.579094890510951643137 3.932408759124087715975 | | That's PBKAC --- even double precision does NOT get you 22 digits precision. | | Hmm, yes, but you would expect the SAME function on the SAME data to yield the same floating point number, and give the SAME printout on the SAME R on the SAME hardware... | | FWIW all the Mac versions that I can access give the same results as the eclipse version. | | Let's look at the numbers side-by-side | | -17.579094890510951643137 3.932408759124087715975 | -17.579094890510877036153.93240875912408460735 | ! ! | 12.345678901234567890123 1.234567890123456789012 | | so we're seeing differences around the 15th/16th significant digit. This is consistent with a difference of about one unit of least precision in the actual objects, but there could conceivably be other explanations, e.g. the print() function picking up random garbage. Jeroen: Could you save() the results from the two cases, load() them in a new session and compute the difference? Yes 15 to 16 is common. I should have added that to my post when I said '22 is too much'. And I did not want to give the impression that nine is what one gets, nine is the minimum as per the libc docs I quoted but as you illustrate, 15 to 16 can often be had. Thanks for the follow-up. Dirk | You may want to read up on 'what every computer scientist should know about | floating point arithmetic' by Goldberg (which is both a true internet classic) | and ponder why a common setting for the various 'epsilon' settings of general | convergence is set to of the constants supplied by the OS and/or its C | library. R has | | #define SINGLE_EPS FLT_EPSILON | [...] | #define DOUBLE_EPS DBL_EPSILON | | in Constants.h. You can then chase the definition of FLT_EPSILON and | DBL_EPSILON through your system headers (which is a good exercise). | | One place you may end up in the manual -- the following from the GNU libc | documentationon :Floating Point Parameters | | FLT_EPSILON | This is the minimum positive floating point number of type float such that | 1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5. | | DBL_EPSILON | LDBL_EPSILON | These are similar to FLT_EPSILON, but for the data types double and long | double, respectively. The type of the macro's value is the same as the type | it describes. The values are not supposed to be greater than 1E-9. | | So there -- nine digits. | | Dirk | | | | # And this is through eclipse (java) | | | | sessionInfo() | | R version 2.13.1 (2011-07-08) | | Platform: i686-pc-linux-gnu (32-bit) | | | | locale: | | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | | [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 | | [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 | | [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 | | [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 | | [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 | | | | attached base packages: | | [1] stats graphics grDevices utils datasets methods base | | | | other attached packages: | | [1] rj_0.5.2-1 | | | | loaded via a namespace (and not attached): | | [1] rJava_0.9-1 tools_2.13.1 | | | | set.seed(123) | | print(coef(lm(dist~speed, data=cars)),digits=22) | | (Intercept)speed | | | | | | | __ | | R-devel@r-project.org mailing list | | https://stat.ethz.ch/mailman/listinfo/r-devel | | -- | Gauss once played himself in a zero-sum game and won $50. | -- #11 at http://www.gaussfacts.com | | __ |
Re: [Rd] Randomness not due to seed
Did you actually see some rounding differences? The lm objects made in the calls to maklm will differ in the environments attached to the formula (because you made the formula in the function). If I change both copies of that .Environment attribute to .GlobalEnv (or any other environment), then identical reports the objects are the same: attr(attr(mylm1$model, terms), .Environment) - .GlobalEnv attr(mylm1$terms, .Environment) - .GlobalEnv attr(attr(mylm2$model, terms), .Environment) - .GlobalEnv attr(mylm2$terms, .Environment) - .GlobalEnv identical(mylm1, mylm2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of jeroen00ms Sent: Tuesday, July 19, 2011 6:13 AM To: r-devel@r-project.org Subject: [Rd] Randomness not due to seed I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1 - lm(dist~speed, data=cars); mylm2 - lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm - function(){ return(lm(dist~speed, data=cars)); } mylm1 - makelm(); mylm2 - makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? -- View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed- tp3678082p3678082.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel