szehon-ho commented on code in PR #56419:
URL: https://github.com/apache/spark/pull/56419#discussion_r3494526634
##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -750,6 +751,41 @@ dmlStatementNoWith
notMatchedBySourceClause*
#mergeIntoTable
;
+autoCdcCommand
+ : AUTO CDC INTO target=multipartIdentifier
+ autoCdcParameters
+ ;
+
+autoCdcBody
+ : AUTO CDC autoCdcParameters
+ ;
+
+autoCdcParameters
+ : FROM source=autoCdcSource
+ KEYS LEFT_PAREN keys=identifierSeq RIGHT_PAREN
+ autoCdcDeleteClause?
+ autoCdcSequenceByClause
+ autoCdcColumnsClause?
+ ;
+
+autoCdcSource
+ : STREAM LEFT_PAREN multipartIdentifier RIGHT_PAREN
+ ;
+
+autoCdcDeleteClause
+ : APPLY AS DELETE WHEN deleteCondition=booleanExpression
+ ;
+
+autoCdcSequenceByClause
+ : SEQUENCE BY sequence=expression
+ ;
+
+autoCdcColumnsClause
+ : COLUMNS (
+ columns=identifierSeq |
Review Comment:
For consideration: the include-list has no parentheses while the `* EXCEPT`
branch requires them, so the two forms are asymmetric and diverge from the
established AUTO CDC / APPLY CHANGES syntax which parenthesizes both:
```sql
-- this PR accepts (bare list):
COLUMNS id, name, value
-- but the conventional form is:
COLUMNS (id, name, value)
```
As written, a user typing `COLUMNS (id, name, value)` gets a parse error.
Would it be cleaner to wrap the include list in `LEFT_PAREN ... RIGHT_PAREN`
too, to match `* EXCEPT (...)` and the existing syntax?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -1362,6 +1362,58 @@ class AstBuilder extends DataTypeAstBuilder
withSchemaEvolution)
}
+ protected def buildAutoCdcIntoCommand(ctx: AutoCdcCommandContext):
AutoCdcIntoCommand =
+ withOrigin(ctx) {
+ val target = visitMultipartIdentifier(ctx.target).asTableIdentifier
Review Comment:
For consideration: `asTableIdentifier` on a `Seq[String]` only accepts 1- or
2-part names and otherwise throws `IDENTIFIER_TOO_MANY_NAME_PARTS`. So while
the source supports 3 parts (`FROM STREAM(mycat.myschema.source)`, which is
tested), a catalog-qualified target fails:
```sql
-- works:
CREATE FLOW f AS AUTO CDC INTO target FROM STREAM(mycat.myschema.source)
KEYS (id) SEQUENCE BY ts;
-- fails with IDENTIFIER_TOO_MANY_NAME_PARTS:
CREATE FLOW f AS AUTO CDC INTO mycat.myschema.target FROM STREAM(source)
KEYS (id) SEQUENCE BY ts;
```
This is also inconsistent with the `CREATE STREAMING TABLE
mycat.myschema.target FLOW AUTO CDC ...` form, which accepts the 3-part name
(it uses `UnresolvedIdentifier`). Could we model the target as a multi-part
identifier here too (or at least raise a clean 'not supported' error and add a
3-part-target test)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]