GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/18320
[SPARK-21093][R] Avoid mcfork in R's daemon in gapply/gapplyCollect tests
## What changes were proposed in this pull request?
`mcfork` in R looks opening a pipe ahead but the existing logic does not
properly close it when it is executed hot. This leads to the failure of more
forking due to the limit for number of files open.
This hot execution looks particularly for `gapply`/`gapplyCollect`. For
unknown reason, this happens more easily in CentOS and could be reproduced in
Mac too.
All the details are described in
https://issues.apache.org/jira/browse/SPARK-21093
This PR proposes simply to avoid reusing that daemon but each process from
JVM that look terminating all correctly.
## How was this patch tested?
I ran the codes below on both CentOS and Mac.
```r
df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))
... # 30 times
```
Also, now it passes R tests on CentOS as below:
```
SparkSQL functions: Spark package found in SPARK_HOME: .../spark
..............................................................................................................................................................
..............................................................................................................................................................
..............................................................................................................................................................
..............................................................................................................................................................
..............................................................................................................................................................
....................................................................................................................................
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-21093
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18320.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18320
----
commit 505d75f0e9a90481f96d0f1fefd4f9baaa38ee7d
Author: hyukjinkwon <[email protected]>
Date: 2017-06-16T02:37:53Z
Avoid mcfork in R's daemon in gapply tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]