Hisoka-X commented on code in PR #41347:
URL: https://github.com/apache/spark/pull/41347#discussion_r1263227170
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala:
##########
@@ -117,37 +311,49 @@ object DeduplicateRelations extends Rule[LogicalPlan] {
}
}
- val planWithNewSubquery = plan.transformExpressions {
- case subquery: SubqueryExpression =>
- val (renewed, changed) =
renewDuplicatedRelations(existingRelations, subquery.plan)
- if (changed) planChanged = true
- subquery.withNewPlan(renewed)
- }
+ val (newPlan, changed) = getNewPlanWithNewChildren(existingRelations,
newChildren.toArray,
+ plan, planChanged)
+ planChanged |= changed
+ newPlan
+ } else {
+ plan
+ }
+ (newPlan, planChanged)
+ }
- if (planChanged) {
- if (planWithNewSubquery.childrenResolved) {
- val planWithNewChildren =
planWithNewSubquery.withNewChildren(newChildren.toSeq)
- val attrMap = AttributeMap(
- plan
- .children
- .flatMap(_.output).zip(newChildren.flatMap(_.output))
- .filter { case (a1, a2) => a1.exprId != a2.exprId }
- )
- if (attrMap.isEmpty) {
- planWithNewChildren
- } else {
- planWithNewChildren.rewriteAttrs(attrMap)
- }
- } else {
- planWithNewSubquery.withNewChildren(newChildren.toSeq)
- }
- } else {
+ private def getNewPlanWithNewChildren(
Review Comment:
It is different after the change. We need to recursively process the node
and refresh its relationship. Before the change, there is only one
`MultiInstanceRelation`, and it has no child, so you can do
`getNewPlanWithNewChildren` in the LogicalPlan case (because it will match
LogicalPlan except for `MultiInstanceRelation`, but this is not the case now) .
But for example, for `Project`, we need to process the child of the `Project`
first when we match the `Project`, and call `getNewPlanWithNewChildren` on the
child of the `Project` at the same time. Then we can deal with the conflicts of
the `Project`. It's hard to do this without extract out a function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]