stevomitric commented on code in PR #55706:
URL: https://github.com/apache/spark/pull/55706#discussion_r3219178857
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##########
@@ -82,6 +82,13 @@ class ResolveIdentifierClause(earlyBatches:
Seq[RuleExecutor[LogicalPlan]#Batch]
IdentifierResolution.evalIdentifierExpr(e.identifierExpr),
e.otherExprs)
}
}
+ // When `PlanWithUnresolvedIdentifier` materializes into a `CTEInChildren`
(e.g.
+ // `InsertIntoStatement`) inside an outer `WithCTE`, push the CTE defs
into the command's
+ // children - restoring the invariant from `CTESubstitution.withCTEDefs`.
+ resolved.resolveOperatorsUpWithPruning(_.containsPattern(CTE)) {
+ case WithCTE(c: CTEInChildren, cteDefs) => c.withCTEDefs(cteDefs)
Review Comment:
I tried this and reverted - the parser-level placement breaks legacy-mode
parameter binding (spark.sql.legacy.parameterSubstitution.constantsOnly=true)
for INSERT … IDENTIFIER(:p).
The problem: InsertIntoStatement.child = query (statements.scala:210), so
.table is not in children. BindParameters.bind (parameters.scala:177-186) walks
via resolveOperatorsDown (children-based) and transformExpressionsWithPruning
(expressions only) — neither reaches a LogicalPlan-typed non-child field.
Tree-pattern propagation in TreeNode.getDefaultTreePatternBits
(TreeNode.scala:104-108) also flows only through children, so even
containsPattern(PARAMETER) returns false on the wrapping
InsertIntoStatement and the rule prunes itself out. Result: the
NamedParameter inside PlanWithUnresolvedIdentifier.identifierExpr is never
bound, the placeholder never resolves, and analysis fails with
[UNSUPPORTED_INSERT.RDD_BASED]. The non-legacy path works because
withIdentClause short-circuits a Literal directly to UnresolvedRelation
(AstBuilder.scala:85) and never creates PlanWithUnresolvedIdentifier in the
first place.
Making the parser-level placement work would require teaching BindParameters
(and tree-pattern propagation) about the non-child placeholder slot on every
CTEInChildren write command, CTAS/RTAS name, plus future additions. That's a
broader refactor than this bug warrants. Happy to do the deeper refactor as a
separate change if you'd prefer.
##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -2460,4 +2460,53 @@ class ParametersSuite extends SharedSparkSession {
spark.sql("SELECT 1", Array.empty[Any]),
Row(1))
}
+
+ test("WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte") {
+ withTable("t_cte_overwrite") {
+ sql("CREATE TABLE t_cte_overwrite (a INT) USING PARQUET")
+ sql("INSERT INTO t_cte_overwrite VALUES (10)")
+ spark.sql(
+ """WITH transformation AS (SELECT 1 AS a)
+ |INSERT OVERWRITE TABLE IDENTIFIER(:tname)
+ |SELECT * FROM transformation""".stripMargin,
+ Map("tname" -> "t_cte_overwrite"))
+ checkAnswer(spark.table("t_cte_overwrite"), Row(1))
+ }
+ }
+
+ test("WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte") {
+ withTable("t_cte_into") {
+ sql("CREATE TABLE t_cte_into (a INT) USING PARQUET")
+ spark.sql(
+ """WITH transformation AS (SELECT 7 AS a)
+ |INSERT INTO IDENTIFIER(:tname)
+ |SELECT * FROM transformation""".stripMargin,
+ Map("tname" -> "t_cte_into"))
+ checkAnswer(spark.table("t_cte_into"), Row(7))
+ }
+ }
+
+ test("Analyzed plan does not leave WithCTE wrapping a CTEInChildren " +
Review Comment:
Added a CTAS variant - `CREATE TABLE IDENTIFIER(:p) AS WITH … SELECT * FROM
t`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]