cloud-fan commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-865762921
Right now, we don't have a dedicated phase for executing DPP subqueries.
They are treated like normal subqueries and are executed right before we
execute the main query.
Let's think about non-AQE first. We need to run EnsureRequirements after DPP
in case the output partitioning changes. And we need to execute the DPP
subqueries first. Before that, we need to optimize the main query and apply
exchange/subquery reuse first.
That said, I think we should execute DPP subqueries after the query plan is
fully optimized and ready to execute. For safety I think we should run the rule
that triggers DPP subquery execution and apply DS v2 pushdown after all the
existing physical rules are run. i.e.
```
CoalesceBucketsInJoin,
PlanDynamicPruningFilters(sparkSession),
PlanSubqueries(sparkSession),
...
ReuseExchangeAndSubquery,
// Above are existing physical rules
PushRuntimeFiltersToDataSource,
EnsureRequirements // This is the second run
```
AQE would be more complicated as the fully optimized query plan is only
available at the query stage optimization phase, where it's not allowed to
change stage boundaries anymore.
I agree that it's better to allow the v2 source to change its output
partitioning after runtime filter pushdown, but I'm not quite sure we should
allow it if it introduces extra shuffles. The cost of extra shuffles can be
large.
I think we can simplify the design if we don't allow runtime filter pushdown
to introduce extra shuffles. Spark can give v2 source both the runtime filter
and the required distribution, so that the v2 source can handle it properly and
change output partitioning as long as it can still satisfy the required
distribution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]