Hisoka-X commented on code in PR #41347:
URL: https://github.com/apache/spark/pull/41347#discussion_r1263227170


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala:
##########
@@ -117,37 +311,49 @@ object DeduplicateRelations extends Rule[LogicalPlan] {
           }
         }
 
-        val planWithNewSubquery = plan.transformExpressions {
-          case subquery: SubqueryExpression =>
-            val (renewed, changed) = 
renewDuplicatedRelations(existingRelations, subquery.plan)
-            if (changed) planChanged = true
-            subquery.withNewPlan(renewed)
-        }
+        val (newPlan, changed) = getNewPlanWithNewChildren(existingRelations, 
newChildren.toArray,
+          plan, planChanged)
+        planChanged |= changed
+        newPlan
+      } else {
+        plan
+      }
+      (newPlan, planChanged)
+  }
 
-        if (planChanged) {
-          if (planWithNewSubquery.childrenResolved) {
-            val planWithNewChildren = 
planWithNewSubquery.withNewChildren(newChildren.toSeq)
-            val attrMap = AttributeMap(
-              plan
-                .children
-                .flatMap(_.output).zip(newChildren.flatMap(_.output))
-                .filter { case (a1, a2) => a1.exprId != a2.exprId }
-            )
-            if (attrMap.isEmpty) {
-              planWithNewChildren
-            } else {
-              planWithNewChildren.rewriteAttrs(attrMap)
-            }
-          } else {
-            planWithNewSubquery.withNewChildren(newChildren.toSeq)
-          }
-        } else {
+  private def getNewPlanWithNewChildren(

Review Comment:
   It is different after the change. We need to recursively process the node 
and refresh its relationship. Before the change, there is only one 
`MultiInstanceRelation`, and it has no child, so you can do 
`getNewPlanWithNewChildren` in the LogicalPlan case (because it will match 
LogicalPlan except for `MultiInstanceRelation`, but this is not the case now) . 
But for example, for `Project`, we need to process the child of the `Project` 
first when we match the `Project`, and call `getNewPlanWithNewChildren` on the 
child of the `Project` at the same time. Then we can deal with the conflicts of 
the `Project`. It's hard to do this without extract out a function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to