Hi developers,


After some investigation I have found there can be large discrepancies in the 
same object being saved as an external "xx.RData" file. The immediate 
repercussion of this is the possible increased size of your .RData workspace 
for no apparent reason.



The function and its three scenarios below highlight these discrepancies. Note 
that the object being returned is exactly the same in each circumstance. The 
first scenario simply loops over a set of lm() models from a simulated set of 
data. The second adds a reasonably large matrix calculation within the loop. 
The third highlights exactly where the discrepancy lies. It appears that when 
the object is saved to an "xx.RData" it is still burdened, in some capacity, 
with the objects created in the function. Only deleting these objects at the 
end of the function ensures the realistic size of the returned object. 
Performing gc() after each of these short simulations shows that the "Vcells" 
that are accumulated in the function environment appear to remain after the 
function returns. These cached remains are then transferred to the .RData upon 
saving of the object(s). This is occurring quite broadly across the Windows 7 
(R 2.10.1) and 64 Bit Ubuntu Linux (R 2.9.0) systems that I use.



A similar problem was partially pointed out four years ago



http://tolstoy.newcastle.edu.au/R/help/06/03/24060.html



and has been made more obvious in the scenarios given below.



Admittedly I have had many problems with workspace .RData sizes over the years 
and it has taken me some time to realise what is actually occurring. Can 
someone enlighten myself and my colleagues as to why the objects created and 
evaluated in a function call stack are saved, in some capacity, with the 
returned object?



Cheers,

Julian



####################### small simulation from a clean directory



lmfunc <- function(loop = 20, add = FALSE, gr = FALSE){

  lmlist <- rmlist <- list()

  set.seed(100)

  dat <- data.frame(matrix(rnorm(100*100), ncol = 100))

  rm <- matrix(rnorm(100000), ncol = 1000)

  names(dat)[1] <- "y"

  i <- 1

  for(i in 1:loop) {

    lmlist[[i]] <- lm(y ~ ., data = dat)

    if(add)

        rmlist[[i]] <- rm

  }

  fm <- lmlist[[loop]]

  if(gr) {

    print(what <- ls(envir = sys.frame(which = 1)))

    remove(list = setdiff(what, "fm"))

  }

  fm

}



# baseline gc()



> gc()

          used (Mb) gc trigger (Mb) max used (Mb)

Ncells 153325  4.1     350000  9.4   350000  9.4

Vcells  99228  0.8     786432  6.0   386446  3.0



###### 1. simple lm() simulation



> lmtest1 <- lmfunc()

> gc()

          used (Mb) gc trigger (Mb) max used (Mb)

Ncells 184470  5.0     407500 10.9   350000  9.4

Vcells 842169  6.5    1300721 10.0  1162577  8.9



> save(lmtest1, file = "lm1.RData")

> system("ls -s lm1.RData")

4312 lm1.RData



## A moderate increase in Vcells; .RData object around 4.5 Mb



###### 2. add matrix calculation to loop



> lmtest2 <- lmfunc(add = TRUE)

> gc()

           used (Mb) gc trigger (Mb) max used (Mb)

Ncells  209316  5.6     407500 10.9   405340 10.9

Vcells 3584244 27.4    4175939 31.9  3900869 29.8



> save(lmtest2, file = "lm2.RData")

> system("ls -s lm2.RData")

19324 lm2.RData



## A enormous increase in Vcells; .RData object is now 19Mb+



###### 3. delete all objects in function call stack



> lmtest3 <- lmfunc(add = TRUE, gr = TRUE)

> gc()

           used (Mb) gc trigger (Mb) max used (Mb)

Ncells  210766  5.7     467875 12.5   467875 12.5

Vcells 3615863 27.6    6933688 52.9  6898609 52.7



> save(lmtest3, file = "lm3.RData")

> system("ls -s lm3.RData")

320 lm3.RData



## A minimal increase in Vcells; .RData object is now 320Kb



> sapply(ls(pattern = "lmtest*"), function(x) object.size(get(x, envir = 
> .GlobalEnv)))

lmtest1 lmtest2 lmtest3

 358428  358428  358428



## all objects are deemed the same size by object.size()

######################### End sim

--
---
Dr. Julian Taylor                     phone: +61 8 8303 8792
Postdoctoral Fellow                     fax: +61 8 8303 8763
CMIS, CSIRO                          mobile: +61 4 1638 8180
Private Mail Bag 2                    email: julian.tay...@csiro.au
Glen Osmond, SA, 5064
---


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to