AngersZhuuuu commented on a change in pull request #28805:
URL: https://github.com/apache/spark/pull/28805#discussion_r444305821
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
##########
@@ -87,8 +88,17 @@ private[sql] object PruneFileSourcePartitions extends
Rule[LogicalPlan] {
_,
_))
if filters.nonEmpty && fsRelation.partitionSchemaOption.isDefined =>
- val (partitionKeyFilters, _) = getPartitionKeyFiltersAndDataFilters(
- fsRelation.sparkSession, logicalRelation, partitionSchema, filters,
logicalRelation.output)
+ val predicates =
conjunctiveNormalFormAndGroupExpsByReference(filters.reduceLeft(And))
Review comment:
> I still don't see the rationale. What if we don't do the group by and
simply apply the CNF conversion?
Such a case
```
TBL: test
PARTITION COLS : dt
SELECT * FROM test where (dt = 2 and id < 100 and id > 20 ) or dt = 3
```
if we don't group by reference, the condition ` (dt = 2 and id < 100 and id
> 20 ) or dt = 3` will be converted to
```
(dt = 3 or dt = 2) and (dt = 3 or id < 100) and (dt = 3 or id > 20)
```
but we know that only `(dt = 3 or dt = 2) ` can be predicated as partition
pruning, we can combine `id < 100` and `id > 20` by grouByReference, and
return as
```
(dt = 3 or dt = 2) and (dt = 3 or (id < 100 and id > 20))
```
In other word , since in final strategies of partition pruning, we
partition predicate filter by judge if it's references is subset of partCols,
if we combine condition group by reference, Here we result with `or` expression
and `or` can't be split by `splitConjunctivePredicates`, it won't impact final
push down result.
It's ok to just simply apply CNF rule, but group by references can avoid
generate unnecessary expression to control the length of generated final exprs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]