WamBamBoozle commented on a change in pull request #31162: URL: https://github.com/apache/spark/pull/31162#discussion_r558449658
########## File path: R/pkg/inst/worker/daemon.R ########## @@ -32,6 +32,9 @@ inputCon <- socketConnection( SparkR:::doServerAuth(inputCon, Sys.getenv("SPARKR_WORKER_SECRET")) +# Application-specific daemon initialization. Typical use is loading libraries. +eval(parse(text = Sys.getenv("SPARKR_DAEMON_INIT"))) + Review comment: To give you a sense of the cost of this, consider ``` > microbenchmark(NULL, times = 999999) Unit: nanoseconds expr min lq mean median uq max neval NULL 2 4 4.607765 4 5 11552 999999 ``` so on my 2018 MacBook Pro 2.2 GHz 6-Core Intel Core i7, R evaluates NULL in 4 nanoseconds. ``` > Sys.setenv(x="NULL") > microbenchmark(eval(parse(text = Sys.getenv("x"))), times=99999) Unit: microseconds expr min lq mean median uq max neval eval(parse(text = Sys.getenv("x"))) 33.854 35.82 40.15034 37.479 39.4475 7219.072 99999 ``` It takes 40 microseconds to unpack the environment variable and evaluate it. For comparison, consider - the 6 milliseconds we use loading worker.R at every fork (we could load it once and then invoke it as a function thus saving 6 milliseconds off of every fork). - the time applications save by moving their initialization here. For example, our application takes 5 seconds to load its libraries, and we've 40 thousand partitions. If n = the number of executors, then for every call to gapply() the wall time saved is (5 seconds) * (40,000 partitions) / n = 56 hours / n. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org