GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/19349

    [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF.

    ## What changes were proposed in this pull request?
    
    Currently we use Arrow File format to communicate with Python worker when 
invoking vectorized UDF but we can use Arrow Stream format.
    
    This pr adds a config `"spark.sql.execution.arrow.stream.enable"` to enable 
Arrow Stream format.
    
    ## How was this patch tested?
    
    Existing tests, and tests for vectorized UDF with the stream format enabled.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22125

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19349
    
----
commit 3c45c5c132f91f32878dd52245a1beb55eca05e7
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-21T05:33:01Z

    Extract PythonRunner from PythonRDD.scala file.

commit 1cd832cb796bdb7e330e56a953274ed577dc8876
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-21T06:33:43Z

    Extract writer thread.

commit 919811d9ffacb8218acf7148b2f0918b255c4f3a
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-21T07:23:55Z

    Extract reader iterator.

commit b2fed104ee00f5bf8235e21b01f89c98ec9400fc
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-21T09:00:42Z

    Introduce ArrowStreamPythonUDFRunner.

commit 937292d0a2a2145be3dbc6314cf0da1b41e71b6e
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-22T11:03:07Z

    Add ArrowStreamPandasSerializer.

commit 80167219abf98b8c019df3582a8c2b3ec6697753
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-22T11:14:08Z

    Introduce ArrowStreamEvalPythonExec.

commit e62d619e13f63af5af2f386c0d7ab554ad3c6336
Author: Takuya UESHIN <[email protected]>
Date:   2017-09-22T11:36:11Z

    Enable vectorized UDF via Arrow stream protocol.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to