ulysses-you commented on PR #36845:
URL: https://github.com/apache/spark/pull/36845#issuecomment-1153742980
I can re-produce it by:
```sql
CREATE TABLE t1(c1 int) USING PARQUET PARTITIONED BY (p1 string);
CREATE TABLE t2(c2 int) USING PARQUET PARTITIONED BY (p2 string);
SELECT * from (
SELECT /*+ merge(t1) */ p1 FROM t1 JOIN t2 ON c1 = c2
) x JOIN t2 ON p1 = p2
WHERE
c2 > 0
```
The reason is, AQE + DPP will insert a broadcast exchange at the top of
`AdaptiveSparkPlanExec` when it is broadcast reusable. There exists some hacky
code for this behavior during AQE `re-optimize`:
```scala
// When both enabling AQE and DPP, `PlanAdaptiveDynamicPruningFilters` rule
will
// add the `BroadcastExchangeExec` node manually in the DPP subquery,
// not through `EnsureRequirements` rule. Therefore, when the DPP subquery
is complicated
// and need to be re-optimized, AQE also need to manually insert the
`BroadcastExchangeExec`
// node to prevent the loss of the `BroadcastExchangeExec` node in DPP
subquery.
// Here, we also need to avoid to insert the `BroadcastExchangeExec` node
when the newPlan
// is already the `BroadcastExchangeExec` plan after apply the
`LogicalQueryStageStrategy` rule.
val finalPlan = currentPhysicalPlan match {
case b: BroadcastExchangeLike
if (!newPlan.isInstanceOf[BroadcastExchangeLike]) =>
b.withNewChildren(Seq(newPlan))
case _ => newPlan
}
```
However, this code does not match if the top level broadcast exchange is
wrapped by query stage. This case will happen if the broadcast exchange which
is added by DPP is running before than the normal broadcast exchange(e.g.
introduced by join).
So we can match `BroadcastQueryStage(_, ReusedExchangeExec, _)` and skip the
optimization. It is no meaning to optimize a child inside a reused exchange
which is only for broadcast.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]