[GitHub] [spark] WamBamBoozle commented on a change in pull request #31162: [SPARK-34033][R] SparkR Daemon Initialization

GitBox Fri, 15 Jan 2021 09:41:47 -0800


WamBamBoozle commented on a change in pull request #31162:
URL: https://github.com/apache/spark/pull/31162#discussion_r558449658




##########
File path: R/pkg/inst/worker/daemon.R
##########
@@ -32,6 +32,9 @@ inputCon <- socketConnection(
 
 SparkR:::doServerAuth(inputCon, Sys.getenv("SPARKR_WORKER_SECRET"))
 
+# Application-specific daemon initialization. Typical use is loading libraries.
+eval(parse(text = Sys.getenv("SPARKR_DAEMON_INIT")))
+

Review comment:
       To give you a sense of the cost of this, consider
   ```
   > microbenchmark(NULL, times = 999999)
   Unit: nanoseconds
    expr min lq     mean median uq   max  neval
    NULL   2  4 4.607765      4  5 11552 999999
   ```
   so on my 2017 MacBook pro, R evaluates NULL in 4 nanoseconds.
   ```
   > Sys.setenv(x="NULL")
   > microbenchmark(eval(parse(text = Sys.getenv("x"))), times=99999)
   Unit: microseconds
                                   expr    min    lq     mean median      uq    
  max neval
    eval(parse(text = Sys.getenv("x"))) 33.854 35.82 40.15034 37.479 39.4475 
7219.072 99999
   ```
   It takes 40 microseconds to unpack the environment variable and evaluate it.
   
   For comparison, consider
   
    - the 6 milliseconds we use loading worker.R at every fork when we could 
load it once and then just invoke it as a function thus saving 6 milliseconds 
off of every fork. It is an obvious optimization that just doesn't seem worth 
it, and
    - the time per fork applications save by moving their initialization here. 
For example, our application takes 5 seconds to load its libraries for every 
fork




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WamBamBoozle commented on a change in pull request #31162: [SPARK-34033][R] SparkR Daemon Initialization

Reply via email to