dtenedor commented on code in PR #48047:
URL: https://github.com/apache/spark/pull/48047#discussion_r1757500490
##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -604,6 +604,7 @@ queryTerm
operator=INTERSECT setQuantifier? right=queryTerm
#setOperation
| left=queryTerm {!legacy_setops_precedence_enabled}?
operator=(UNION | EXCEPT | SETMINUS) setQuantifier? right=queryTerm
#setOperation
+ | left=queryTerm OPERATOR_PIPE operatorPipeRightSide
#operatorPipeStatement
Review Comment:
Sure, no problem, I can try to explain it.
ANTLR tokenizes each SQL query it receives, converting the input string into
a sequence of tokens (using `SqlBaseLexer.g4`). Then the parser's job (in this
file) is to convert that sequence of tokens into an initial unresolved logical
plan representing the parse tree.
To do so, the parser checks each rule in the listed sequence, one-by-one,
comparing the provided tokens at the current index in the sequence with the
required tokens from the rule. If the rule matches, wherein all keywords and
other components in the rule map to corresponding input tokens, then the parser
generates the rule's unresolved logical plan tree using the logic in
`AstBuilder.scala`.
In this case, we define the new token `OPERATOR_PIPE: '|>';` in
`SqlBaseLexer.g4`. Then we add a new option to the existing `queryTerm` rule to
allow any syntax matching an existing `queryTerm` to appear on the left side of
this `|>` token and the syntax of `operatorPipeRightSide` on the right side
(which in this PR is limited to only a `selectClause`).
ANTLR grammar allows left-recursive rules wherein any alternative may begin
with a reference to the same rule, so the `queryTerm` on the left side may
match any valid existing syntax for a `queryTerm` such as `TABLE t`, a table
subquery, etc. Since we are extending `queryTerm` to also match against
`queryTerm OPERATOR_PIPE operatorPipeRightSide`, this alternative implements
the recursion wherein we may chain multiple pipe operators together. For
example, in `TABLE t |> SELECT x |> LIMIT 2`, `TABLE t` matches a `queryTerm`,
then `TABLE t |> SELECT x` matches another, and finally the entire query (using
the new recursive `#operatorPipeStatement` alternative two times).
Otherwise, if the rule does not match, then the parser moves on to try the
next rule in the sequence, and so on, similar to a Scala pattern-match. This
defines the precedence of the rules amongst each other: the ones appearing
first in the list in `SqlBaseParser.g4` apply first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]