Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18465
  
    > isn't being "terminated" enough? that we have to pskill it again?
    
    Indeed, this is the point. I really tested so many times but `exit` does 
not terminate (I think I might have to change the term used in the comments in 
my PR, maybe to 'kill').
    
    This can be tested as below:
    
    ```
    vi tmp.R
    ```
    
    ```r
    for(i in 0:50) {
      p <- parallel:::mcfork()
      if (inherits(p, "masterProcess")) {
        # Send no data
        parallel:::mcexit(0L)
      }
    }
    
    Sys.sleep(10)
    ```
    
    ```
    Rscript tmp.R
    ```
    
    and, `ps -fe | grep /exec/R` shows the children processes.
    
     > the original issue with leaking was thought to be related to child 
process getting stuck and not terminated properly?
    
    I think yes, it is. It looked they are killed but somehow pipes (file 
descriptors) were left open. This hit the ulimit of open files and then it went 
an error like `Resource temporarily unavailable fork`. This looked because the 
children were killed by itself before or while calling `exit()` and left 
incomplete.
    
    >  would that manifest again under this new behavior? in other words, would 
it get into a state where integer is never returned to the master?
    
    In the original behaviour, I think the problem is `exit`is not being 
properly called or terminated during `exit`. The new change here I think it 
makes sure the children call `exit` properly. 
    
    I think this test makes sure what I said:
    
    ```r
    read.pids <- function(children) {
      lapply(children, function(child) {
        print(parallel:::readChild(child))
      })
    }
    
    kill <- function(children) {
      lapply(children, function(child) {
        tools::pskill(child, tools::SIGUSR1)
      })
    }
    
    
    p <- parallel:::mcfork()
    if (inherits(p, "masterProcess")) {
      parallel:::mcexit(0L)
    }
    
    Sys.sleep(3)
    
    print("Children exited - new behaviour here")
    children <- parallel:::selectChildren(timeout = 0)
    if (is.integer(children)) {
      read.pids(children)
      kill(children)
    }
    
    
    p <- parallel:::mcfork()
    if (inherits(p, "masterProcess")) {
      tools::pskill(Sys.getpid(), tools::SIGUSR1)
      parallel:::mcexit(0L)
    }
    
    Sys.sleep(3)
    
    print("Children killed itself - old behaviour here")
    children <- parallel:::selectChildren(timeout = 0)
    if (is.integer(children)) {
      read.pids(children)
    }
    
    
    p <- parallel:::mcfork()
    if (inherits(p, "masterProcess")) {
      Sys.sleep(10000)
    }
    
    Sys.sleep(3)
    
    print("Children killed by parent without exiting")
    children <- parallel:::selectChildren(timeout = 0)
    if (is.integer(children)) {
      kill(children)
      # wait for kill
      Sys.sleep(1)
      read.pids(children)
    }
    
    
    ```
    
    In my local, this prints the pid when `mcexit` is called properly.
    
    ```r
    [1] "Children exited - new behaviour here"
    [1] 12992
    [[1]]
    [1] FALSE
    
    [1] "Children killed itself - old behaviour here"
    [1] "Children killed by parent without exiting"
    ```
    
    And here we kill it.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to