GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/22206
SPARK-25213: Add project to v2 scans before python filters.
## What changes were proposed in this pull request?
The v2 API always adds a projection when converting to physical plan to
ensure that rows are all `UnsafeRow`. This is added after any filters run by
Spark, assuming that the filter and projection can handle InternalRow, but this
fails if those nodes contain python UDFs. This PR detects the Python UDFs and
adds a projection above the filter to immediately convert to `UnsafeRow` before
passing data to python.
## How was this patch tested?
This adds a test for the case reported in SPARK-25213 in python's SQL tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-25213-v2-add-project-before-python-filter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22206.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22206
----
commit ada6e924597cbe247c8be47fd80df38b79cb34e5
Author: Ryan Blue <blue@...>
Date: 2018-08-23T17:37:19Z
SPARK-25213: Add project to v2 scans before python filters.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]