[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

HyukjinKwon Tue, 06 Nov 2018 01:45:52 -0800

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22954


    [DO-NOT-MERGE][POC] Enables Arrow optimization from R DataFrame to Spark 
DataFrame

    ## What changes were proposed in this pull request?
    
    This PR is not for merging it but targets to demonstrates the feasibility 
(with reusing PyArrow code path at its best) and performance improvement when 
converting R dataframes to Spark's dataframe. This can be tested as below:
    
    ```bash
    $ ./bin/sparkR --conf spark.sql.execution.arrow.enabled=true
    ```
    
    ```r
    collect(createDataFrame(mtcars))
    ```
    
    **Requirements:**
      - R 3.5.x 
      - Arrow package 0.12+ (not released yet)
      - CRAN released (ARROW-3204)
      - withr package
    
    **TODOs:**
    - [ ] Performance measurement
    - [ ] TDB
    
    ## How was this patch tested?
    
    Small test was added.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark r-arrow-createdataframe

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22954.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22954
    
----
commit 90011a5ff48f2c5fa5fae0e2573fcdaa85d44976
Author: hyukjinkwon <gurwls223@...>
Date:   2018-11-06T02:38:37Z

    [POC] Enables Arrow optimization from R DataFrame to Spark DataFrame

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

Reply via email to