Re: [PR] [CONNECT][SDP][SPARK-56249] Create AUTO CDC syntax for SCD Type 1 in SQL [spark]

via GitHub Mon, 29 Jun 2026 13:38:48 -0700


szehon-ho commented on code in PR #56419:
URL: https://github.com/apache/spark/pull/56419#discussion_r3494526634



##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -750,6 +751,41 @@ dmlStatementNoWith
         notMatchedBySourceClause*                                              
    #mergeIntoTable
     ;
 
+autoCdcCommand
+    : AUTO CDC INTO target=multipartIdentifier
+        autoCdcParameters
+    ;
+
+autoCdcBody
+    : AUTO CDC autoCdcParameters
+    ;
+
+autoCdcParameters
+    : FROM source=autoCdcSource
+        KEYS LEFT_PAREN keys=identifierSeq RIGHT_PAREN
+        autoCdcDeleteClause?
+        autoCdcSequenceByClause
+        autoCdcColumnsClause?
+    ;
+
+autoCdcSource
+    : STREAM LEFT_PAREN multipartIdentifier RIGHT_PAREN
+    ;
+
+autoCdcDeleteClause
+    : APPLY AS DELETE WHEN deleteCondition=booleanExpression
+    ;
+
+autoCdcSequenceByClause
+    : SEQUENCE BY sequence=expression
+    ;
+
+autoCdcColumnsClause
+    : COLUMNS (
+        columns=identifierSeq |

Review Comment:
   For consideration: the include-list has no parentheses while the `* EXCEPT` 
branch requires them, so the two forms are asymmetric and diverge from the 
established AUTO CDC / APPLY CHANGES syntax which parenthesizes both:
   
   ```sql
   -- this PR accepts (bare list):
   COLUMNS id, name, value
   -- but the conventional form is:
   COLUMNS (id, name, value)
   ```
   
   As written, a user typing `COLUMNS (id, name, value)` gets a parse error. 
Would it be cleaner to wrap the include list in `LEFT_PAREN ... RIGHT_PAREN` 
too, to match `* EXCEPT (...)` and the existing syntax?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -1362,6 +1362,58 @@ class AstBuilder extends DataTypeAstBuilder
       withSchemaEvolution)
   }
 
+  protected def buildAutoCdcIntoCommand(ctx: AutoCdcCommandContext): 
AutoCdcIntoCommand =
+    withOrigin(ctx) {
+      val target = visitMultipartIdentifier(ctx.target).asTableIdentifier

Review Comment:
   For consideration: `asTableIdentifier` on a `Seq[String]` only accepts 1- or 
2-part names and otherwise throws `IDENTIFIER_TOO_MANY_NAME_PARTS`. So while 
the source supports 3 parts (`FROM STREAM(mycat.myschema.source)`, which is 
tested), a catalog-qualified target fails:
   
   ```sql
   -- works:
   CREATE FLOW f AS AUTO CDC INTO target FROM STREAM(mycat.myschema.source) 
KEYS (id) SEQUENCE BY ts;
   -- fails with IDENTIFIER_TOO_MANY_NAME_PARTS:
   CREATE FLOW f AS AUTO CDC INTO mycat.myschema.target FROM STREAM(source) 
KEYS (id) SEQUENCE BY ts;
   ```
   
   This is also inconsistent with the `CREATE STREAMING TABLE 
mycat.myschema.target FLOW AUTO CDC ...` form, which accepts the 3-part name 
(it uses `UnresolvedIdentifier`). Could we model the target as a multi-part 
identifier here too (or at least raise a clean 'not supported' error and add a 
3-part-target test)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CONNECT][SDP][SPARK-56249] Create AUTO CDC syntax for SCD Type 1 in SQL [spark]

Reply via email to