We have a project that calls for the creation of a list of many distribution objects. Distributions can be of various types, with various parameters, but we ran into some problems. I started testing on a simple list of rnorm-based objects.
I was a little surprised at the RAM storage requirements, here's an example: N <- 10000 closureList <- vector("list", N) nsize = sample(x = 1:100, size = N, replace = TRUE) for (i in seq_along(nsize)){ closureList[[i]] <- list(func = rnorm, n = nsize[i]) } format(object.size(closureList), units = "Mb") Output says 22.4 MB I noticed that if I do not name the objects in the list, then the storage drops to 19.9 MB. That seemed like a lot of storage for a function's name. Why so much? My colleagues think the RAM use is high because this is a closure (hence closureList). I can't even convince myself it actually is a closure. The R source has rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd) The storage holding 10000 copies of rnorm, but we really only need 1, which we can use in the objects. Thinking of this like C, I am looking to pass in a pointer to the function. I found my way to the idea of putting a function in an environment in order to pass it by reference: rnormPointer <- function(inputValue1, inputValue2){ object <- new.env(parent=globalenv()) object$distr <- inputValue1 object$n <- inputValue2 class(object) <- 'pointer' object } ## Experiment with that gg <- rnormPointer(rnorm, 33) gg$distr(gg$n) ptrList <- vector("list", N) for(i in seq_along(nsize)) { ptrList[[i]] <- rnormPointer(rnorm, nsize[i]) } format(object.size(ptrList), units = "Mb") The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM required for closureList. This thing works in the way I expect ## can pass in the unnamed arguments for n, mean and sd here ptrList[[1]]$distr(33, 100, 10) ## Or the named arguments ptrList[[1]]$distr(1, sd = 100) This environment trick mostly works, so far as I can see, but I have these questions. 1. Is the object.size() return accurate for ptrList? Do I really reduce storage to that amount, or is the required storage someplace else (in the new environment) that is not included in object.size()? 2. Am I running with scissors here? Unexpected bad things await? 3. Why is the storage for closureList so great? It looks to me like rnorm is just this little thing: function (n, mean = 0, sd = 1) .Call(C_rnorm, n, mean, sd) <bytecode: 0x55cc9988cae0> 4. Could I learn (you show me?) to store the bytecode address as a thing and use it in the objects? I'd guess that is the fastest possible way. In an Objective-C problem in the olden days, we found the method-lookup was a major slowdown and one of the programmers showed us how to save the lookup and use it over and over. pj -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write to me directly, please address me at pauljohn at ku.edu. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.