Re: [PR] [SPARK-46625][SQL] Push WithCTE into CTEInChildren when materialised by ResolveIdentifierClause [spark]

via GitHub Mon, 11 May 2026 06:17:47 -0700


stevomitric commented on code in PR #55706:
URL: https://github.com/apache/spark/pull/55706#discussion_r3219178857



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##########
@@ -82,6 +82,13 @@ class ResolveIdentifierClause(earlyBatches: 
Seq[RuleExecutor[LogicalPlan]#Batch]
               IdentifierResolution.evalIdentifierExpr(e.identifierExpr), 
e.otherExprs)
         }
     }
+    // When `PlanWithUnresolvedIdentifier` materializes into a `CTEInChildren` 
(e.g.
+    // `InsertIntoStatement`) inside an outer `WithCTE`, push the CTE defs 
into the command's
+    // children - restoring the invariant from `CTESubstitution.withCTEDefs`.
+    resolved.resolveOperatorsUpWithPruning(_.containsPattern(CTE)) {
+      case WithCTE(c: CTEInChildren, cteDefs) => c.withCTEDefs(cteDefs)

Review Comment:
   I tried this and reverted - the parser-level placement breaks legacy-mode 
parameter binding (spark.sql.legacy.parameterSubstitution.constantsOnly=true) 
for INSERT … IDENTIFIER(:p).
   
   The problem: InsertIntoStatement.child = query (statements.scala:210), so 
.table is not in children. BindParameters.bind (parameters.scala:177-186) walks 
via resolveOperatorsDown (children-based) and transformExpressionsWithPruning 
(expressions only) — neither reaches a LogicalPlan-typed non-child field. 
Tree-pattern propagation in TreeNode.getDefaultTreePatternBits 
(TreeNode.scala:104-108) also flows only through children, so even 
containsPattern(PARAMETER) returns false on the wrapping 
   
   InsertIntoStatement and the rule prunes itself out. Result: the 
NamedParameter inside PlanWithUnresolvedIdentifier.identifierExpr is never 
bound, the placeholder never resolves, and analysis fails with 
[UNSUPPORTED_INSERT.RDD_BASED]. The non-legacy path works because 
withIdentClause short-circuits a Literal directly to UnresolvedRelation 
(AstBuilder.scala:85) and never creates PlanWithUnresolvedIdentifier in the 
first place.
   
   Making the parser-level placement work would require teaching BindParameters 
(and tree-pattern propagation) about the non-child placeholder slot on every 
CTEInChildren write command, CTAS/RTAS name, plus future additions. That's a 
broader refactor than this bug warrants. Happy to do the deeper refactor as a 
separate change if you'd prefer.



##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -2460,4 +2460,53 @@ class ParametersSuite extends SharedSparkSession {
       spark.sql("SELECT 1", Array.empty[Any]),
       Row(1))
   }
+
+  test("WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte") {
+    withTable("t_cte_overwrite") {
+      sql("CREATE TABLE t_cte_overwrite (a INT) USING PARQUET")
+      sql("INSERT INTO t_cte_overwrite VALUES (10)")
+      spark.sql(
+        """WITH transformation AS (SELECT 1 AS a)
+          |INSERT OVERWRITE TABLE IDENTIFIER(:tname)
+          |SELECT * FROM transformation""".stripMargin,
+        Map("tname" -> "t_cte_overwrite"))
+      checkAnswer(spark.table("t_cte_overwrite"), Row(1))
+    }
+  }
+
+  test("WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte") {
+    withTable("t_cte_into") {
+      sql("CREATE TABLE t_cte_into (a INT) USING PARQUET")
+      spark.sql(
+        """WITH transformation AS (SELECT 7 AS a)
+          |INSERT INTO IDENTIFIER(:tname)
+          |SELECT * FROM transformation""".stripMargin,
+        Map("tname" -> "t_cte_into"))
+      checkAnswer(spark.table("t_cte_into"), Row(7))
+    }
+  }
+
+  test("Analyzed plan does not leave WithCTE wrapping a CTEInChildren " +

Review Comment:
    Added a CTAS variant - `CREATE TABLE IDENTIFIER(:p) AS WITH … SELECT * FROM 
t`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46625][SQL] Push WithCTE into CTEInChildren when materialised by ResolveIdentifierClause [spark]

Reply via email to