[GitHub] [spark] WamBamBoozle commented on a change in pull request #31162: [SPARK-34033][R] SparkR Daemon Initialization

GitBox Wed, 03 Feb 2021 16:38:22 -0800


WamBamBoozle commented on a change in pull request #31162:
URL: https://github.com/apache/spark/pull/31162#discussion_r558449658




##########
File path: R/pkg/inst/worker/daemon.R
##########
@@ -32,6 +32,9 @@ inputCon <- socketConnection(
 
 SparkR:::doServerAuth(inputCon, Sys.getenv("SPARKR_WORKER_SECRET"))
 
+# Application-specific daemon initialization. Typical use is loading libraries.
+eval(parse(text = Sys.getenv("SPARKR_DAEMON_INIT")))
+

Review comment:
       To give you a sense of the cost of this, consider
   ```
   > microbenchmark(NULL, times = 999999)
   Unit: nanoseconds
    expr min lq     mean median uq   max  neval
    NULL   2  4 4.607765      4  5 11552 999999
   ```
   so on my 2018 MacBook Pro 2.2 GHz 6-Core Intel Core i7, R evaluates NULL in 
4 nanoseconds.
   ```
   > Sys.setenv(x="NULL")
   > microbenchmark(eval(parse(text = Sys.getenv("x"))), times=99999)
   Unit: microseconds
                                   expr    min    lq     mean median      uq    
  max neval
    eval(parse(text = Sys.getenv("x"))) 33.854 35.82 40.15034 37.479 39.4475 
7219.072 99999
   ```
   It takes 40 microseconds to unpack the environment variable and evaluate it.
   
   For comparison, consider
   
    - the 6 milliseconds we use loading worker.R at every fork (we could load 
it once and then invoke it as a function thus saving 6 milliseconds off of 
every fork).
    - the time applications save by moving their initialization here. For 
example, our application takes 5 seconds to load its libraries, and we've 40 
thousand partitions. 50 s * 40,000 = 56 hours cpu-time saved for every call to 
gapply




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WamBamBoozle commented on a change in pull request #31162: [SPARK-34033][R] SparkR Daemon Initialization

Reply via email to