peter-toth commented on a change in pull request #28318:
URL: https://github.com/apache/spark/pull/28318#discussion_r414411578
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
##########
@@ -31,53 +31,43 @@ object CTESubstitution extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = {
LegacyBehaviorPolicy.withName(SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_POLICY))
match {
case LegacyBehaviorPolicy.EXCEPTION =>
- assertNoNameConflictsInCTE(plan, inTraverse = false)
- traverseAndSubstituteCTE(plan, inTraverse = false)
+ assertNoNameConflictsInCTE(plan)
+ traverseAndSubstituteCTE(plan)
case LegacyBehaviorPolicy.LEGACY =>
legacyTraverseAndSubstituteCTE(plan)
case LegacyBehaviorPolicy.CORRECTED =>
- traverseAndSubstituteCTE(plan, inTraverse = false)
+ traverseAndSubstituteCTE(plan)
}
}
/**
* Check the plan to be traversed has naming conflicts in nested CTE or not,
traverse through
- * child, innerChildren and subquery for the current plan.
+ * child, innerChildren and subquery expressions for the current plan.
*/
private def assertNoNameConflictsInCTE(
plan: LogicalPlan,
- inTraverse: Boolean,
- cteNames: Set[String] = Set.empty): Unit = {
- plan.foreach {
+ namesInChildren: Set[String] = Set.empty,
+ namesInExpressions: Set[String] = Set.empty): Unit = {
Review comment:
Yes, we do. What we want to catch here is that legacy and corrected
modes MAY produce different results due to the different substitution order on
the same data. But there are cases like the above when both modes does the
substitution in the same order and we can be sure that the result is the same.
We usually say that in legacy mode the outer CTE takes precedence while in
corrected mode the inner one does.
While this statement is entirely true for corrected mode, the legacy mode is
a bit weird in terms that in case of we have a nested conflicting CTE in a
subquery the inner takes precedence (similar to the corrected mode). But if we
have a nested CTE in a subquery expression or in an inner children the outer
does (in contrast to the corrected mode).
`namesInExpressions` is used, when we traverse subquery expressions or inner
children (CTE definitions) we pass `namesInExpressions` into `namesInChildren`.
I admit that the naming of these parameters might be confusing
`namesInChildren` is what we actually check for conflicts.
`namesInChildren` is what we pass along when we traverse children (in this
case the 2 different modes do the substitution in the same order and inner
always takes precedence so we don't expand this set with new names when we
encounter a `With`).
`namesInExpressions` is what we pass along when we traverse subquery
expressions and inner children (in this case the 2 different modes do the
substitution different order). But once we encounter a subquery expressions or
an inner children we need to switch the possible conflicting set to the
`namesInExpressions`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]