Swinky commented on pull request #34062:
URL: https://github.com/apache/spark/pull/34062#issuecomment-925108409
> do you mean `InferFiltersFromConstraints` can generate static partition
predicates and we don't need to trigger DPP in that case?
@cloud-fan correct, examples below:
dimTable `d` has columns (d1, d2...)
factTable `f` has columns (f1, f2, f3...) partitioned on f1, f2.
Example 1:
```
join(d1=f1)
/ \
Filter(d1=100) FactTable(f)
|
dimTable(d)
```
PartitionFilters for FactTable: [f1=100, f1 in dpp-subquery] //
"f1=100" here is inferred in `InferFiltersFromConstraints`
After Proposed change: [f1=100]
Example 2:
```
join(d1=f1, d2=f2)
/ \
Filter(d1=100) FactTable(f)
|
dimTable(d)
```
PartitionFilters for FactTable now: [f1=100, f1 in (d1 values from
dpp-subquery1), f2 in (d2 values from dpp-subquery1)] // "f1=100" here is
inferred in `InferFiltersFromConstraints`
After Proposed change: [f1=100, f2 in (d2 values from dpp-subquery1)]
Example 3:
```
join(d1=f1, d2=f2)
/ \
Filter(d1=100 || d3=200) FactTable(f)
|
dimTable(d)
```
PartitionFilters for FactTable now: [f1 in (d1 values from
dpp-subquery1), f2 in (d2 values from dpp-subquery1)]
After Proposed change: No change in this case as the filter references
in the filter are not a subset of d1 nor it is subset of d3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]