GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/21262
[SPARK-24172][SQL]: Push projection and filters once when converting to
physical plan.
## What changes were proposed in this pull request?
This removes `PruneFileSourcePartitions` and moves projection and filter
push-down to `DataSourceV2Strategy`. This accomplishes the same goal as #21230
and only runs the push-down once by not using `transformUp` to traverse the
plan.
Unlike #21230, this moves pushdown to the v2 strategy to match the way
pushdown happens for other code paths: when creating a physical plan from a
logical plan. This was suggested by @marmbrus in #20387, but not implemented at
the time. The same concern from that PR still applies to this commit:
**pushdown is not applied until conversion to a physical plan, so
`computeStats` can't report stats after filtering or projecting**.
A benefit of this approach is that the `DataSourceV2Relation` is simpler
and the relation's `output` is constant.
## How was this patch tested?
This uses existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-24172-v2-pushdown-in-strategy
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21262.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21262
----
commit 7497cc2308136ded913b3745b9232487a949804a
Author: Ryan Blue <blue@...>
Date: 2018-05-07T20:08:02Z
DataSourceV2: push projection, filters when converting to physical plan.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]