Re: [Rd] Randomness not due to seed

2011-07-25 Thread Paul Johnson
On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms jeroen.o...@stat.ucla.edu wrote:
 I am working on a reproducible computing platform for which I would like to
 be able to _exactly_ reproduce an R object. However, I am experiencing
 unexpected randomness in some calculations. I have a hard time finding out
 exactly how it occurs. The code below illustrates the issue.

 mylm1 - lm(dist~speed, data=cars);
 mylm2 - lm(dist~speed, data=cars);
 identical(mylm1, mylm2); #TRUE

 makelm - function(){
        return(lm(dist~speed, data=cars));
 }

 mylm1 - makelm();
 mylm2 - makelm();
 identical(mylm1, mylm2); #FALSE

 When inspecting both objects there seem to be some rounding differences.
 Setting a seed does not make a difference. Is there any way I can remove
 this randomness and exactly reproduce the object every time?


William Dunlap was correct.  Observe in the sequence of comparisons
below, the difference in the terms object is causing the identical
to fail: Everything else associated with this model--the coefficients,
the r-square, cov matrix, etc, exactly match.


 mylm1 - lm(dist~speed, data=cars);
 mylm2 - lm(dist~speed, data=cars);
 identical(mylm1, mylm2); #TRUE
[1] TRUE
 makelm - function(){
+return(lm(dist~speed, data=cars));
+ }
 mylm1 - makelm();
 mylm2 - makelm();
 identical(mylm1, mylm2); #FALSE
[1] FALSE
 identical(coef(mylm1), coef(mylm2))
[1] TRUE
 identical(summary(mylm1), summary(mylm2))
[1] FALSE
 identical(coef(summary(mylm1)), coef(summary(mylm2)))
[1] TRUE
 all.equal(mylm1, mylm2)
[1] TRUE
 identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)
[1] TRUE
 identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)
[1] TRUE
 identical(summary(mylm1)$sigma, summary(mylm2)$sigma)
[1] TRUE
 identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)
[1] TRUE
 identical(summary(mylm1)$residuals, summary(mylm2)$residuals)
[1] TRUE
 identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)
[1] TRUE
 identical(summary(mylm1)$call, summary(mylm2)$call)
[1] TRUE
 identical(summary(mylm1)$terms, summary(mylm2)$terms)
[1] FALSE

 summary(mylm2)$terms
dist ~ speed
attr(,variables)
list(dist, speed)
attr(,factors)
  speed
dist  0
speed 1
attr(,term.labels)
[1] speed
attr(,order)
[1] 1
attr(,intercept)
[1] 1
attr(,response)
[1] 1
attr(,.Environment)
environment: 0x1b76ae0
attr(,predvars)
list(dist, speed)
attr(,dataClasses)
 dist speed
numeric numeric

 summary(mylm1)$terms
dist ~ speed
attr(,variables)
list(dist, speed)
attr(,factors)
  speed
dist  0
speed 1
attr(,term.labels)
[1] speed
attr(,order)
[1] 1
attr(,intercept)
[1] 1
attr(,response)
[1] 1
attr(,.Environment)
environment: 0x1cf06b8
attr(,predvars)
list(dist, speed)
attr(,dataClasses)
 dist speed
numeric numeric




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Mike Marchywka








 Date: Tue, 19 Jul 2011 06:13:01 -0700
 From: jeroen.o...@stat.ucla.edu
 To: r-devel@r-project.org
 Subject: [Rd] Randomness not due to seed

 I am working on a reproducible computing platform for which I would like to
 be able to _exactly_ reproduce an R object. However, I am experiencing
 unexpected randomness in some calculations. I have a hard time finding out
 exactly how it occurs. The code below illustrates the issue.

 mylm1 - lm(dist~speed, data=cars);
 mylm2 - lm(dist~speed, data=cars);
 identical(mylm1, mylm2); #TRUE

 makelm - function(){
 return(lm(dist~speed, data=cars));
 }

 mylm1 - makelm();
 mylm2 - makelm();
 identical(mylm1, mylm2); #FALSE

 When inspecting both objects there seem to be some rounding differences.
 Setting a seed does not make a difference. Is there any way I can remove
 this randomness and exactly reproduce the object every time?

I don't know if anyone had a specific answer for this but in general floating 
point is not
something for which you want to make bitwise equality tests. You can check the 
Intel
website for some references but IIRC the FPU can start your calculation with 
bits or
settings ( flushing denorms to zero for example) left over from the last user 
although I can't document that.  

for example, you can probably find more like this suggesting that changes in 
alignmnet
and rounding in preamble code can be significant, 

http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

and of course if your algorithm is numerically sensitive results could change a 
lot. Now
its also possible you have unitiliazed or corrupt memory, but you would need to 
consider that you will not get bit wise reproduvibility. You can of course go 
to java
if you really want that LOL. 







 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
  
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Duncan Murdoch

On 11-07-19 8:01 PM, Mike Marchywka wrote:










Date: Tue, 19 Jul 2011 06:13:01 -0700
From: jeroen.o...@stat.ucla.edu
To: r-devel@r-project.org
Subject: [Rd] Randomness not due to seed

I am working on a reproducible computing platform for which I would like to
be able to _exactly_ reproduce an R object. However, I am experiencing
unexpected randomness in some calculations. I have a hard time finding out
exactly how it occurs. The code below illustrates the issue.

mylm1- lm(dist~speed, data=cars);
mylm2- lm(dist~speed, data=cars);
identical(mylm1, mylm2); #TRUE

makelm- function(){
return(lm(dist~speed, data=cars));
}

mylm1- makelm();
mylm2- makelm();
identical(mylm1, mylm2); #FALSE

When inspecting both objects there seem to be some rounding differences.
Setting a seed does not make a difference. Is there any way I can remove
this randomness and exactly reproduce the object every time?


I don't know if anyone had a specific answer for this


I think Bill Dunlap's answer addressed it:  the claim appears to be false.

Duncan Murdoch

but in general floating point is not

something for which you want to make bitwise equality tests. You can check the 
Intel
website for some references but IIRC the FPU can start your calculation with 
bits or
settings ( flushing denorms to zero for example) left over from the last user 
although I can't document that.

for example, you can probably find more like this suggesting that changes in 
alignmnet
and rounding in preamble code can be significant,

http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

and of course if your algorithm is numerically sensitive results could change a 
lot. Now
its also possible you have unitiliazed or corrupt memory, but you would need to
consider that you will not get bit wise reproduvibility. You can of course go 
to java
if you really want that LOL.








--
View this message in context: 
http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Duncan Murdoch
I would guess the error below is because of Java messing around in the 
hardware.  It's pretty common on Windows for DLLs to attempt to change 
the precision setting on the floating point processor; I hadn't seen 
that before on Linux, but that would be my guess as to the cause.


It's also possible that one of the attached packages has messed with R 
functions somehow, e.g. by replacing the default print() or show() method.


A third possibility is that different math libraries are being used.

So I would consider the differences in the results to be a bit of a bug, 
but not one that is likely under our control, and not one that is so 
large that I would worry about working around it.


Duncan Murdoch

On 20/07/2011 8:03 AM, Jeroen Ooms wrote:

  I think Bill Dunlap's answer addressed it:  the claim appears to be false.

Here is another example where there is randomness that is not due to
the seed. On the same machine, the same R binary, but through another
interface. First directly in the shell:

  sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

  set.seed(123)
  print(coef(lm(dist~speed, data=cars)),digits=22)
   (Intercept) speed
-17.579094890510951643137   3.932408759124087715975



# And this is through eclipse (java)

  sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
  [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] rj_0.5.2-1

loaded via a namespace (and not attached):
[1] rJava_0.9-1  tools_2.13.1

  set.seed(123)
  print(coef(lm(dist~speed, data=cars)),digits=22)
  (Intercept)speed
-17.57909489051087703615   3.93240875912408460735


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Martyn Byng
Hi,

Even using the same math libraries you can get different results,
depending on what sorts of instructions those libraries use, see the
following (none R related) blog article:
http://blog.nag.com/2011/02/wandering-precision.html.

Martyn

-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
Sent: 20 July 2011 14:47
To: Jeroen Ooms
Cc: r-devel@r-project.org
Subject: Re: [Rd] Randomness not due to seed

I would guess the error below is because of Java messing around in the 
hardware.  It's pretty common on Windows for DLLs to attempt to change 
the precision setting on the floating point processor; I hadn't seen 
that before on Linux, but that would be my guess as to the cause.

It's also possible that one of the attached packages has messed with R 
functions somehow, e.g. by replacing the default print() or show()
method.

A third possibility is that different math libraries are being used.

So I would consider the differences in the results to be a bit of a bug,

but not one that is likely under our control, and not one that is so 
large that I would worry about working around it.

Duncan Murdoch

On 20/07/2011 8:03 AM, Jeroen Ooms wrote:
   I think Bill Dunlap's answer addressed it:  the claim appears to
be false.

 Here is another example where there is randomness that is not due to
 the seed. On the same machine, the same R binary, but through another
 interface. First directly in the shell:

   sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

   set.seed(123)
   print(coef(lm(dist~speed, data=cars)),digits=22)
(Intercept) speed
 -17.579094890510951643137   3.932408759124087715975



 # And this is through eclipse (java)

   sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
   [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
   [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] rj_0.5.2-1

 loaded via a namespace (and not attached):
 [1] rJava_0.9-1  tools_2.13.1

   set.seed(123)
   print(coef(lm(dist~speed, data=cars)),digits=22)
   (Intercept)speed
 -17.57909489051087703615   3.93240875912408460735

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Duncan Murdoch

On 20/07/2011 9:59 AM, Martyn Byng wrote:

Hi,

Even using the same math libraries you can get different results,
depending on what sorts of instructions those libraries use, see the
following (none R related) blog article:
http://blog.nag.com/2011/02/wandering-precision.html.


That's another cause that I hadn't considered, also mostly out of our 
control.  (Whoever compiles R does have some control over what 
optimizations the compiler does, but they might not be aware of them all.)


Duncan Murdoch


Martyn

-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
Sent: 20 July 2011 14:47
To: Jeroen Ooms
Cc: r-devel@r-project.org
Subject: Re: [Rd] Randomness not due to seed

I would guess the error below is because of Java messing around in the
hardware.  It's pretty common on Windows for DLLs to attempt to change
the precision setting on the floating point processor; I hadn't seen
that before on Linux, but that would be my guess as to the cause.

It's also possible that one of the attached packages has messed with R
functions somehow, e.g. by replacing the default print() or show()
method.

A third possibility is that different math libraries are being used.

So I would consider the differences in the results to be a bit of a bug,

but not one that is likely under our control, and not one that is so
large that I would worry about working around it.

Duncan Murdoch

On 20/07/2011 8:03 AM, Jeroen Ooms wrote:
 I think Bill Dunlap's answer addressed it:  the claim appears to
be false.

  Here is another example where there is randomness that is not due to
  the seed. On the same machine, the same R binary, but through another
  interface. First directly in the shell:

 sessionInfo()
  R version 2.13.1 (2011-07-08)
  Platform: i686-pc-linux-gnu (32-bit)

  locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base

 set.seed(123)
 print(coef(lm(dist~speed, data=cars)),digits=22)
 (Intercept) speed
  -17.579094890510951643137   3.932408759124087715975



  # And this is through eclipse (java)

 sessionInfo()
  R version 2.13.1 (2011-07-08)
  Platform: i686-pc-linux-gnu (32-bit)

  locale:
[1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
  [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8

  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base

  other attached packages:
  [1] rj_0.5.2-1

  loaded via a namespace (and not attached):
  [1] rJava_0.9-1  tools_2.13.1

 set.seed(123)
 print(coef(lm(dist~speed, data=cars)),digits=22)
(Intercept)speed
  -17.57909489051087703615   3.93240875912408460735

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


This e-mail has been scanned for all viruses by Star.



The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Paul Gilbert
It does not look like your calculation is using the random number generator, so 
the other responses are probably more to the point.  

However, beware that setting the seed is not enough to guarantee the same 
random numbers. You need to also make sure you are using the same uniform RNG 
and any other generators you use, such as the normal generator. R has a large 
selection of possibilities. Your start up settings could change the default 
behaviour. Also, relying on the default will be a bit risky if you are 
interested in reproducible calculations, because the R default could change in 
the future (as it has in the past, and as has the Splus generator in the past).

If the RNG is important for your reproducible calculations then you might want 
to look at the examples and tests in the setRNG package.

Paul

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
 project.org] On Behalf Of jeroen00ms
 Sent: July 19, 2011 9:13 AM
 To: r-devel@r-project.org
 Subject: [Rd] Randomness not due to seed
 
 I am working on a reproducible computing platform for which I would
 like to
 be able to _exactly_ reproduce an R object. However, I am experiencing
 unexpected randomness in some calculations. I have a hard time finding
 out
 exactly how it occurs. The code below illustrates the issue.
 
 mylm1 - lm(dist~speed, data=cars);
 mylm2 - lm(dist~speed, data=cars);
 identical(mylm1, mylm2); #TRUE
 
 makelm - function(){
   return(lm(dist~speed, data=cars));
 }
 
 mylm1 - makelm();
 mylm2 - makelm();
 identical(mylm1, mylm2); #FALSE
 
 When inspecting both objects there seem to be some rounding
 differences.
 Setting a seed does not make a difference. Is there any way I can
 remove
 this randomness and exactly reproduce the object every time?
 
 
 
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Randomness-
 not-due-to-seed-tp3678082p3678082.html
 Sent from the R devel mailing list archive at Nabble.com.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread peter dalgaard

On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote:

 
 On 20 July 2011 at 14:03, Jeroen Ooms wrote:
 |  I think Bill Dunlap's answer addressed it:  the claim appears to be 
 false.
 | 
 | Here is another example where there is randomness that is not due to
 | the seed. On the same machine, the same R binary, but through another
 | interface. First directly in the shell:
 | 
 |  sessionInfo()
 | R version 2.13.1 (2011-07-08)
 | Platform: i686-pc-linux-gnu (32-bit)
 | 
 | locale:
 |  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 |  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 |  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 |  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 |  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 | 
 | attached base packages:
 | [1] stats graphics  grDevices utils datasets  methods   base
 | 
 |  set.seed(123)
 |  print(coef(lm(dist~speed, data=cars)),digits=22)
 |   (Intercept) speed
 | -17.579094890510951643137   3.932408759124087715975
 
 That's PBKAC --- even double precision does NOT get you 22 digits precision.

Hmm, yes, but you would expect the SAME function on the SAME data to yield the 
same floating point number, and give the SAME printout on the SAME R on the 
SAME hardware... 

FWIW all the Mac versions that I can access give the same results as the 
eclipse version.

Let's look at the numbers side-by-side

-17.579094890510951643137   3.932408759124087715975
-17.579094890510877036153.93240875912408460735
!   !
 12.345678901234567890123   1.234567890123456789012

so we're seeing differences around the 15th/16th significant digit. This is 
consistent with a difference of about one unit of least precision in the actual 
objects, but there could conceivably be other explanations, e.g. the print() 
function picking up random garbage. Jeroen: Could you save() the results from 
the two cases, load() them in a new session and compute the difference?

  
 
 You may want to read up on 'what every computer scientist should know about
 floating point arithmetic' by Goldberg (which is both a true internet classic)
 and ponder why a common setting for the various 'epsilon' settings of general
 convergence is set to of the constants supplied by the OS and/or its C
 library. R has
 
  #define SINGLE_EPS FLT_EPSILON
  [...]
  #define DOUBLE_EPS DBL_EPSILON
 
 in Constants.h. You can then chase the definition of FLT_EPSILON and
 DBL_EPSILON through your system headers (which is a good exercise).
 
 One place you may end up in the manual -- the following from the GNU libc
 documentationon :Floating Point Parameters
 
 FLT_EPSILON
 This is the minimum positive floating point number of type float such that
 1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 
 1E-5. 
 
 DBL_EPSILON
 LDBL_EPSILON
 These are similar to FLT_EPSILON, but for the data types double and long
 double, respectively. The type of the macro's value is the same as the 
 type
 it describes. The values are not supposed to be greater than 1E-9.
 
 So there -- nine digits. 
 
 Dirk 
 
 
 | # And this is through eclipse (java)
 | 
 |  sessionInfo()
 | R version 2.13.1 (2011-07-08)
 | Platform: i686-pc-linux-gnu (32-bit)
 | 
 | locale:
 |  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
 |  [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
 |  [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
 |  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
 |  [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
 | [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8
 | 
 | attached base packages:
 | [1] stats graphics  grDevices utils datasets  methods   base
 | 
 | other attached packages:
 | [1] rj_0.5.2-1
 | 
 | loaded via a namespace (and not attached):
 | [1] rJava_0.9-1  tools_2.13.1
 | 
 |  set.seed(123)
 |  print(coef(lm(dist~speed, data=cars)),digits=22)
 |  (Intercept)speed
 | 

 | 
 | __
 | R-devel@r-project.org mailing list
 | https://stat.ethz.ch/mailman/listinfo/r-devel
 
 -- 
 Gauss once played himself in a zero-sum game and won $50.
  -- #11 at http://www.gaussfacts.com
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-20 Thread Dirk Eddelbuettel

On 20 July 2011 at 18:02, peter dalgaard wrote:
| 
| On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote:
| 
|  
|  On 20 July 2011 at 14:03, Jeroen Ooms wrote:
|  |  I think Bill Dunlap's answer addressed it:  the claim appears to be 
false.
|  | 
|  | Here is another example where there is randomness that is not due to
|  | the seed. On the same machine, the same R binary, but through another
|  | interface. First directly in the shell:
|  | 
|  |  sessionInfo()
|  | R version 2.13.1 (2011-07-08)
|  | Platform: i686-pc-linux-gnu (32-bit)
|  | 
|  | locale:
|  |  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  |  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  |  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
|  |  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  |  [9] LC_ADDRESS=C   LC_TELEPHONE=C
|  | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|  | 
|  | attached base packages:
|  | [1] stats graphics  grDevices utils datasets  methods   base
|  | 
|  |  set.seed(123)
|  |  print(coef(lm(dist~speed, data=cars)),digits=22)
|  |   (Intercept) speed
|  | -17.579094890510951643137   3.932408759124087715975
|  
|  That's PBKAC --- even double precision does NOT get you 22 digits precision.
| 
| Hmm, yes, but you would expect the SAME function on the SAME data to yield 
the same floating point number, and give the SAME printout on the SAME R on the 
SAME hardware... 
| 
| FWIW all the Mac versions that I can access give the same results as the 
eclipse version.
| 
| Let's look at the numbers side-by-side
| 
| -17.579094890510951643137   3.932408759124087715975
| -17.579094890510877036153.93240875912408460735
| !   !
|  12.345678901234567890123   1.234567890123456789012
| 
| so we're seeing differences around the 15th/16th significant digit. This is 
consistent with a difference of about one unit of least precision in the actual 
objects, but there could conceivably be other explanations, e.g. the print() 
function picking up random garbage. Jeroen: Could you save() the results from 
the two cases, load() them in a new session and compute the difference?

Yes 15 to 16 is common.  I should have added that to my post when I said '22
is too much'. And I did not want to give the impression that nine is what one
gets, nine is the minimum as per the libc docs I quoted but as you
illustrate, 15 to 16 can often be had.

Thanks for the follow-up.

Dirk

 
|  You may want to read up on 'what every computer scientist should know about
|  floating point arithmetic' by Goldberg (which is both a true internet 
classic)
|  and ponder why a common setting for the various 'epsilon' settings of 
general
|  convergence is set to of the constants supplied by the OS and/or its C
|  library. R has
|  
|   #define SINGLE_EPS FLT_EPSILON
|   [...]
|   #define DOUBLE_EPS DBL_EPSILON
|  
|  in Constants.h. You can then chase the definition of FLT_EPSILON and
|  DBL_EPSILON through your system headers (which is a good exercise).
|  
|  One place you may end up in the manual -- the following from the GNU libc
|  documentationon :Floating Point Parameters
|  
|  FLT_EPSILON
|  This is the minimum positive floating point number of type float such 
that
|  1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 
1E-5. 
|  
|  DBL_EPSILON
|  LDBL_EPSILON
|  These are similar to FLT_EPSILON, but for the data types double and long
|  double, respectively. The type of the macro's value is the same as the 
type
|  it describes. The values are not supposed to be greater than 1E-9.
|  
|  So there -- nine digits. 
|  
|  Dirk 
|  
|  
|  | # And this is through eclipse (java)
|  | 
|  |  sessionInfo()
|  | R version 2.13.1 (2011-07-08)
|  | Platform: i686-pc-linux-gnu (32-bit)
|  | 
|  | locale:
|  |  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
|  |  [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
|  |  [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
|  |  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
|  |  [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
|  | [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8
|  | 
|  | attached base packages:
|  | [1] stats graphics  grDevices utils datasets  methods   base
|  | 
|  | other attached packages:
|  | [1] rj_0.5.2-1
|  | 
|  | loaded via a namespace (and not attached):
|  | [1] rJava_0.9-1  tools_2.13.1
|  | 
|  |  set.seed(123)
|  |  print(coef(lm(dist~speed, data=cars)),digits=22)
|  |  (Intercept)speed
|  | 
| 
|  | 
|  | __
|  | R-devel@r-project.org mailing list
|  | https://stat.ethz.ch/mailman/listinfo/r-devel
|  
|  -- 
|  Gauss once played himself in a zero-sum game and won $50.
|   -- #11 at http://www.gaussfacts.com
|  
|  __
|  

Re: [Rd] Randomness not due to seed

2011-07-19 Thread William Dunlap
Did you actually see some rounding differences?

The lm objects made in the calls to maklm will
differ in the environments attached to the formula
(because you made the formula in the function).  If
I change both copies of that .Environment attribute
to .GlobalEnv (or any other environment), then identical
reports the objects are the same:

   attr(attr(mylm1$model, terms), .Environment) - .GlobalEnv
   attr(mylm1$terms, .Environment) - .GlobalEnv
   attr(attr(mylm2$model, terms), .Environment) - .GlobalEnv
   attr(mylm2$terms, .Environment) - .GlobalEnv
   identical(mylm1, mylm2)
  [1] TRUE 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 Behalf Of jeroen00ms
 Sent: Tuesday, July 19, 2011 6:13 AM
 To: r-devel@r-project.org
 Subject: [Rd] Randomness not due to seed
 
 I am working on a reproducible computing platform for which I would like to
 be able to _exactly_ reproduce an R object. However, I am experiencing
 unexpected randomness in some calculations. I have a hard time finding out
 exactly how it occurs. The code below illustrates the issue.
 
 mylm1 - lm(dist~speed, data=cars);
 mylm2 - lm(dist~speed, data=cars);
 identical(mylm1, mylm2); #TRUE
 
 makelm - function(){
   return(lm(dist~speed, data=cars));
 }
 
 mylm1 - makelm();
 mylm2 - makelm();
 identical(mylm1, mylm2); #FALSE
 
 When inspecting both objects there seem to be some rounding differences.
 Setting a seed does not make a difference. Is there any way I can remove
 this randomness and exactly reproduce the object every time?
 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-
 tp3678082p3678082.html
 Sent from the R devel mailing list archive at Nabble.com.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel