peter-toth commented on a change in pull request #28318:
URL: https://github.com/apache/spark/pull/28318#discussion_r414674094



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
##########
@@ -31,53 +31,44 @@ object CTESubstitution extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
     
LegacyBehaviorPolicy.withName(SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_POLICY))
 match {
       case LegacyBehaviorPolicy.EXCEPTION =>
-        assertNoNameConflictsInCTE(plan, inTraverse = false)
-        traverseAndSubstituteCTE(plan, inTraverse = false)
+        assertNoNameConflictsInCTE(plan)
+        traverseAndSubstituteCTE(plan)
       case LegacyBehaviorPolicy.LEGACY =>
         legacyTraverseAndSubstituteCTE(plan)
       case LegacyBehaviorPolicy.CORRECTED =>
-        traverseAndSubstituteCTE(plan, inTraverse = false)
+        traverseAndSubstituteCTE(plan)
     }
   }
 
   /**
    * Check the plan to be traversed has naming conflicts in nested CTE or not, 
traverse through
-   * child, innerChildren and subquery for the current plan.
+   * child, innerChildren and subquery expressions for the current plan.
    */
   private def assertNoNameConflictsInCTE(
       plan: LogicalPlan,
-      inTraverse: Boolean,
-      cteNames: Set[String] = Set.empty): Unit = {
-    plan.foreach {
+      outerCTERelationNames: Set[String] = Set.empty,
+      namesInSubqueries: Set[String] = Set.empty): Unit = {
+    plan match {
       case w @ With(child, relations) =>
         val newNames = relations.map {
           case (cteName, _) =>
-            if (cteNames.contains(cteName)) {
+            if (outerCTERelationNames.contains(cteName)) {
               throw new AnalysisException(s"Name $cteName is ambiguous in 
nested CTE. " +
                 s"Please set ${LEGACY_CTE_PRECEDENCE_POLICY.key} to CORRECTED 
so that name " +
                 "defined in inner CTE takes precedence. If set it to LEGACY, 
outer CTE " +
                 "definitions will take precedence. See more details in 
SPARK-28228.")
             } else {
               cteName
             }
-        }.toSet
-        child.transformExpressions {
-          case e: SubqueryExpression =>
-            assertNoNameConflictsInCTE(e.plan, inTraverse = true, cteNames ++ 
newNames)
-            e
-        }
-        w.innerChildren.foreach { p =>
-          assertNoNameConflictsInCTE(p, inTraverse = true, cteNames ++ 
newNames)
-        }
-
-      case other if inTraverse =>
-        other.transformExpressions {
-          case e: SubqueryExpression =>
-            assertNoNameConflictsInCTE(e.plan, inTraverse = true, cteNames)
-            e
-        }
+        }.toSet ++ namesInSubqueries
+        assertNoNameConflictsInCTE(child, outerCTERelationNames, newNames)

Review comment:
       I checked 3. in Spark 2.4 with the query in the description and it 
returns `1` so is not handled correctly and we should fail.
   
   1. In this case the legacy and corrected mode returns the same result due to 
the same substitution order so we shouldn't throw an exception. If we call the 
possible conflicting set of names coming from outer CTEs as 
`outerCTERelationNames` then I handle this case by not extending the set with 
new names when I traverse down the child of a `With` 
(https://github.com/apache/spark/pull/28318/files#diff-d0bfa3367c63988ad7cf33397e643e75R64)
 or when I traverse down the children of any other node 
(https://github.com/apache/spark/pull/28318/files#diff-d0bfa3367c63988ad7cf33397e643e75R70-R71).
   
   2. and 3. In these cases, the two modes have different substitution order 
and so possibly different results. Obviously we need an extended set of names 
(`newNames`) when we encounter a `With` and switch to that set (pass in as a 
new `outerCTERelationNames`) when we traverse down inner children (2.)( 
https://github.com/apache/spark/pull/28318/files#diff-d0bfa3367c63988ad7cf33397e643e75R65)
 or traverse down subquery expressions 
(3.)(https://github.com/apache/spark/pull/28318/files#diff-d0bfa3367c63988ad7cf33397e643e75R68-R69)
   
   So what is the contents of `outerCTERelationNames`? The set of CTE names 
coming from outer CTEs. It doesn't contain the CTEs coming from a parent `With` 
from the same `LogicalPlan` (1.). But it contains outer CTEs coming from 
`With`s from outer `LogicalPlan`s (2., 3.)
   And the content of `namesInSubqueries` is simply all CTE names from all 
outer `With`s, no matter if the `With` is the same plan or in an outer plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to