peter-toth commented on a change in pull request #28318:
URL: https://github.com/apache/spark/pull/28318#discussion_r414411578



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
##########
@@ -31,53 +31,43 @@ object CTESubstitution extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
     
LegacyBehaviorPolicy.withName(SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_POLICY))
 match {
       case LegacyBehaviorPolicy.EXCEPTION =>
-        assertNoNameConflictsInCTE(plan, inTraverse = false)
-        traverseAndSubstituteCTE(plan, inTraverse = false)
+        assertNoNameConflictsInCTE(plan)
+        traverseAndSubstituteCTE(plan)
       case LegacyBehaviorPolicy.LEGACY =>
         legacyTraverseAndSubstituteCTE(plan)
       case LegacyBehaviorPolicy.CORRECTED =>
-        traverseAndSubstituteCTE(plan, inTraverse = false)
+        traverseAndSubstituteCTE(plan)
     }
   }
 
   /**
    * Check the plan to be traversed has naming conflicts in nested CTE or not, 
traverse through
-   * child, innerChildren and subquery for the current plan.
+   * child, innerChildren and subquery expressions for the current plan.
    */
   private def assertNoNameConflictsInCTE(
       plan: LogicalPlan,
-      inTraverse: Boolean,
-      cteNames: Set[String] = Set.empty): Unit = {
-    plan.foreach {
+      namesInChildren: Set[String] = Set.empty,
+      namesInExpressions: Set[String] = Set.empty): Unit = {

Review comment:
       Yes, we do. What we want to catch here is that legacy and corrected 
modes MAY produce different results due to the different substitution order on 
the same data. But there are cases like the above when both modes does the 
substitution in the same order and we can be sure that the result is the same.
   
   We usually say that in legacy mode the outer CTE takes precedence while in 
corrected mode the inner one does.
   While this statement is entirely true for corrected mode, the legacy mode is 
a bit weird in terms that in case of we have a nested conflicting CTE in a 
subquery the inner takes precedence (similar to the corrected mode). But if we 
have a nested CTE in a subquery expression or in an inner children the outer 
does (in contrast to the corrected mode).
   
   `namesInExpressions` is used, when we traverse subquery expressions or inner 
children (CTE definitions) we pass `namesInExpressions` into `namesInChildren`.
   I admit that the naming of these parameters might be confusing 
`namesInChildren` is what we actually check for conflicts. `namesInExpressions` 
is there because the conflicting set of names are different when we traverse 
subquery expressions or inner children.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to