On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms <jeroen.o...@stat.ucla.edu> wrote: > I am working on a reproducible computing platform for which I would like to > be able to _exactly_ reproduce an R object. However, I am experiencing > unexpected randomness in some calculations. I have a hard time finding out > exactly how it occurs. The code below illustrates the issue. > > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE > > makelm <- function(){ > return(lm(dist~speed, data=cars)); > } > > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE > > When inspecting both objects there seem to be some rounding differences. > Setting a seed does not make a difference. Is there any way I can remove > this randomness and exactly reproduce the object every time? >
William Dunlap was correct. Observe in the sequence of comparisons below, the difference in the "terms" object is causing the identical to fail: Everything else associated with this model--the coefficients, the r-square, cov matrix, etc, exactly match. > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE [1] TRUE > makelm <- function(){ + return(lm(dist~speed, data=cars)); + } > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE [1] FALSE > identical(coef(mylm1), coef(mylm2)) [1] TRUE > identical(summary(mylm1), summary(mylm2)) [1] FALSE > identical(coef(summary(mylm1)), coef(summary(mylm2))) [1] TRUE > all.equal(mylm1, mylm2) [1] TRUE > identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared) [1] TRUE > identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared) [1] TRUE > identical(summary(mylm1)$sigma, summary(mylm2)$sigma) [1] TRUE > identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic) [1] TRUE > identical(summary(mylm1)$residuals, summary(mylm2)$residuals) [1] TRUE > identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled) [1] TRUE > identical(summary(mylm1)$call, summary(mylm2)$call) [1] TRUE > identical(summary(mylm1)$terms, summary(mylm2)$terms) [1] FALSE > summary(mylm2)$terms dist ~ speed attr(,"variables") list(dist, speed) attr(,"factors") speed dist 0 speed 1 attr(,"term.labels") [1] "speed" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment") <environment: 0x1b76ae0> attr(,"predvars") list(dist, speed) attr(,"dataClasses") dist speed "numeric" "numeric" > > summary(mylm1)$terms dist ~ speed attr(,"variables") list(dist, speed) attr(,"factors") speed dist 0 speed 1 attr(,"term.labels") [1] "speed" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment") <environment: 0x1cf06b8> attr(,"predvars") list(dist, speed) attr(,"dataClasses") dist speed "numeric" "numeric" -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel