AngersZhuuuu opened a new pull request #29406:
URL: https://github.com/apache/spark/pull/29406
### What changes were proposed in this pull request?
We support partially push partition filters since SPARK-28169. We can also
support partially push down data filters if it mixed in partition filters and
data filters. For example:
```
spark.sql(
s"""
|CREATE TABLE t(i INT, p STRING)
|USING parquet
|PARTITIONED BY (p)""".stripMargin)
spark.range(0, 1000).selectExpr("id as col").createOrReplaceTempView("temp")
for (part <- Seq(1, 2, 3, 4)) {
sql(s"""
|INSERT OVERWRITE TABLE t PARTITION (p='$part')
|SELECT col FROM temp""".stripMargin)
}
spark.sql("SELECT * FROM t WHERE WHERE (p = '1' AND i = 1) OR (p = '2' and
i = 2)").explain()
```
We can also push down ```i = 1 or i = 2 ```
### Why are the changes needed?
Extract more data filter to FileSourceScanExec
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
Added UT
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]