Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18465
> isn't being "terminated" enough? that we have to pskill it again?
Indeed, this is the point. I really tested so many times but `exit` does
not terminate (I think I might have to change the term used in the comments in
my PR, maybe to 'kill').
This can be tested as below:
```
vi tmp.R
```
```r
for(i in 0:50) {
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
# Send no data
parallel:::mcexit(0L)
}
}
Sys.sleep(10)
```
```
Rscript tmp.R
```
and, `ps -fe | grep /exec/R` shows the children processes.
> the original issue with leaking was thought to be related to child
process getting stuck and not terminated properly?
I think yes, it is. It looked they are killed but somehow pipes (file
descriptors) were left open. This hit the ulimit of open files and then it went
an error like `Resource temporarily unavailable fork`. This looked because the
children were killed by itself before or while calling `exit()` and left
incomplete.
> would that manifest again under this new behavior? in other words, would
it get into a state where integer is never returned to the master?
In the original behaviour, I think the problem is `exit`is not being
properly called or terminated during `exit`. The new change here I think it
makes sure the children call `exit` properly.
I think this test makes sure what I said:
```r
read.pids <- function(children) {
lapply(children, function(child) {
print(parallel:::readChild(child))
})
}
kill <- function(children) {
lapply(children, function(child) {
tools::pskill(child, tools::SIGUSR1)
})
}
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
parallel:::mcexit(0L)
}
Sys.sleep(3)
print("Children exited - new behaviour here")
children <- parallel:::selectChildren(timeout = 0)
if (is.integer(children)) {
read.pids(children)
kill(children)
}
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
tools::pskill(Sys.getpid(), tools::SIGUSR1)
parallel:::mcexit(0L)
}
Sys.sleep(3)
print("Children killed itself - old behaviour here")
children <- parallel:::selectChildren(timeout = 0)
if (is.integer(children)) {
read.pids(children)
}
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
Sys.sleep(10000)
}
Sys.sleep(3)
print("Children killed by parent without exiting")
children <- parallel:::selectChildren(timeout = 0)
if (is.integer(children)) {
kill(children)
# wait for kill
Sys.sleep(1)
read.pids(children)
}
```
In my local, this prints the pid when `mcexit` is called properly.
```r
[1] "Children exited - new behaviour here"
[1] 12992
[[1]]
[1] FALSE
[1] "Children killed itself - old behaviour here"
[1] "Children killed by parent without exiting"
```
And here we kill it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]