Hello R-help,

I've noticed that my 'parallel' jobs take too much memory to store and
transfer to the cluster workers. I've managed to trace it to the
following:

# `payload` is being written to the cluster worker.
# The function FUN had been created as a closure inside my package:
payload$data$args$FUN
# function (l, ...) 
# withCallingHandlers(fun(l$x, ...), error = .wraperr(l$name))
# <bytecode: 0x5644a9f08a90>
# <environment: 0x5644aa841ad8>

# The function seems to bring a lot of captured data with it.
e <- environment(payload$data$args$FUN)
length(serialize(e, NULL))
# [1] 738202878      
parent.env(e)
# <environment: namespace:mypackage>

# The parent environment has a name, so it all must be right here.
# What is it?

ls(e, all.names = TRUE)
# [1] "fun"
length(serialize(e$fun, NULL))
# [1] 317

# The only object in the environment is small!
# Where is the 700 megabytes of data?

length(serialize(e, NULL))
# [1] 536
length(serialize(payload$data$args$FUN, NULL))
# [1] 1722

And once I've observed `fun`, the environment becomes very small and
now can be serialized in a very compact manner.

I managed to work around it by forcing the promise and explicitly
putting `fun` in a small environment when constructing the closure:

.wrapfun <- function(fun) {
 e <- new.env(parent = loadNamespace('mypackage'))
 e$fun <- fun
 # NOTE: a naive return(function(...)) could serialize to 700
 # megabytes due to `fun` seemingly being a promise (?). Once the
 # promise is resolved, suddenly `fun` is much more compact.
 ret <- function(l, ...) withCallingHandlers(
  fun(l$x, ...),
  error = .wraperr(l$name)
 )
 environment(ret) <- e
 ret
}

Is this analysis correct? Could a simple f <- force(fun) have sufficed?
Where can I read more about this type of problems?

If this really is due to promises, what would be the downsides of
forcing them during serialization?

-- 
Best regards,
Ivan

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to