GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/20387
SPARK-22386: DataSourceV2: Use immutable logical plans.
## What changes were proposed in this pull request?
DataSourceV2 should use immutable catalyst trees instead of wrapping a
mutable DataSourceV2Reader. This commit updates DataSourceV2Relation and
consolidates much of the DataSourceV2 API requirements for the read path in it.
Instead of wrapping a reader that changes, the relation lazily produces a
reader from its configuration.
This commit also updates the predicate and projection push-down. Instead of
the implementation from SPARK-22197, this reuses the rule matching from the
Hive and DataSource read paths (using `PhysicalOperation`) and copies most of
the implementation of `SparkPlanner.pruneFilterProject`, with updates for
DataSourceV2. By reusing the implementation from other read paths, this should
have fewer regressions from other read paths and is less code to maintain.
The new push-down rules also support the following edge cases:
* The output of DataSourceV2Relation should be what is returned by the
reader, in case the reader can only partially satisfy the requested schema
projection
* The requested projection passed to the DataSourceV2Reader should include
filter columns
* The push-down rule may be run more than once if filters are not pushed
through projections
## How was this patch tested?
Existing push-down and read tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-22386-push-down-immutable-trees
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20387.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20387
----
commit d3233e1a8b1d4d153146b1a536dee34246920b0d
Author: Ryan Blue <blue@...>
Date: 2018-01-17T21:58:12Z
SPAKR-22386: DataSourceV2: Use immutable logical plans.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]