GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/18320

    [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/gapplyCollect tests

    ## What changes were proposed in this pull request?
    
    `mcfork` in R looks opening a pipe ahead but the existing logic does not 
properly close it when it is executed hot. This leads to the failure of more 
forking due to the limit for number of files open.
    
    This hot execution looks particularly for `gapply`/`gapplyCollect`. For 
unknown reason, this happens more easily in CentOS and could be reproduced in 
Mac too.
    
    All the details are described in 
https://issues.apache.org/jira/browse/SPARK-21093
    
    This PR proposes simply to avoid reusing that daemon but each process from 
JVM that look terminating all correctly.
    
    ## How was this patch tested?
    
    I ran the codes below on both CentOS and Mac.
    
    ```r
    df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
    collect(gapply(df, "a", function(key, x) { x }, schema(df)))
    collect(gapply(df, "a", function(key, x) { x }, schema(df)))
    ...  # 30 times
    ```
    
    Also, now it passes R tests on CentOS as below:
    
    ```
    SparkSQL functions: Spark package found in SPARK_HOME: .../spark
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
....................................................................................................................................
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21093

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18320
    
----
commit 505d75f0e9a90481f96d0f1fefd4f9baaa38ee7d
Author: hyukjinkwon <[email protected]>
Date:   2017-06-16T02:37:53Z

    Avoid mcfork in R's daemon in gapply tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to