On 22/11/2017 11:29 AM, Paul Johnson wrote:
We have a project that calls for the creation of a list of many
distribution objects.  Distributions can be of various types, with
various parameters, but we ran into some problems. I started testing
on a simple list of rnorm-based objects.

I was a little surprised at the RAM storage requirements, here's an example:

N <- 10000
closureList <- vector("list", N)
nsize = sample(x = 1:100, size = N, replace = TRUE)
for (i in seq_along(nsize)){
     closureList[[i]] <- list(func = rnorm, n = nsize[i])
}
format(object.size(closureList), units = "Mb")

Output says
22.4 MB


You should read the help page for object.size. You're doing exactly the kind of thing that causes it to give overestimates of the amount of memory being used.

I'd suggest turning on memory profiling in Rprof() for a more accurate result, but it seems to be broken:

> Rprof(memory.profiling=TRUE)
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
+     closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
> format(object.size(closureList), units = "Mb")
[1] "19.2 Mb"
> Rprof(NULL)
> summaryRprof()
Error in rowsum.default(c(as.vector(new.ftable), fcounts), c(names(new.ftable), :
  unimplemented type 'NULL' in 'HashTableSetup'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'


Duncan Murdoch

I noticed that if I do not name the objects in the list, then the
storage drops to 19.9 MB.

That seemed like a lot of storage for a function's name. Why so much?
My colleagues think the RAM use is high because this is a closure
(hence closureList).  I can't even convince myself it actually is a
closure. The R source has

rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)

The storage holding 10000 copies of rnorm, but we really only need 1,
which we can use in the objects.

Thinking of this like C,  I am looking to pass in a pointer to the
function.  I found my way to the idea of putting a function in an
environment in order to pass it by reference:

rnormPointer <- function(inputValue1, inputValue2){
     object <- new.env(parent=globalenv())
     object$distr <- inputValue1
     object$n <- inputValue2
     class(object) <- 'pointer'
     object
}

## Experiment with that
gg <- rnormPointer(rnorm, 33)
gg$distr(gg$n)

ptrList <- vector("list", N)
for(i in seq_along(nsize)) {
     ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
}
format(object.size(ptrList), units = "Mb")

The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
required for closureList.  This thing works in the way I expect

## can pass in the unnamed arguments for n, mean and sd here
ptrList[[1]]$distr(33, 100, 10)
## Or the named arguments
ptrList[[1]]$distr(1, sd = 100)

This environment trick mostly works, so far as I can see, but I have
these questions.

1. Is the object.size() return accurate for ptrList?  Do I really
reduce storage to that amount, or is the required storage someplace
else (in the new environment) that is not included in object.size()?

2. Am I running with scissors here? Unexpected bad things await?

3. Why is the storage for closureList so great? It looks to me like
rnorm is just this little thing:

function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<bytecode: 0x55cc9988cae0>

4. Could I learn (you show me?) to store the bytecode address as a
thing and use it in the objects?  I'd guess that is the fastest
possible way. In an Objective-C problem in the olden days, we found
the method-lookup was a major slowdown and one of the programmers
showed us how to save the lookup and use it over and over.

pj




______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to