dtenedor commented on code in PR #48047:
URL: https://github.com/apache/spark/pull/48047#discussion_r1757500490


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -604,6 +604,7 @@ queryTerm
         operator=INTERSECT setQuantifier? right=queryTerm                      
          #setOperation
     | left=queryTerm {!legacy_setops_precedence_enabled}?
         operator=(UNION | EXCEPT | SETMINUS) setQuantifier? right=queryTerm    
          #setOperation
+    | left=queryTerm OPERATOR_PIPE operatorPipeRightSide                       
          #operatorPipeStatement

Review Comment:
   Sure, no problem, I can try to explain it.
   
   ANTLR tokenizes each SQL query it receives, converting the input string into 
a sequence of tokens (using `SqlBaseLexer.g4`). Then the parser's job (in this 
file) is to convert that sequence of tokens into an initial unresolved logical 
plan representing the parse tree.
   
   To do so, the parser checks each rule in the listed sequence, one-by-one, 
comparing the provided tokens at the current index in the sequence with the 
required tokens from the rule. If the rule matches, wherein all keywords and 
other components in the rule map to corresponding input tokens, then the parser 
generates the rule's unresolved logical plan tree using the logic in 
`AstBuilder.scala`.
   
   In this case, we define the new token `OPERATOR_PIPE: '|>';` in 
`SqlBaseLexer.g4`. Then we add a new option to the existing `queryTerm` rule to 
allow any syntax matching an existing `queryTerm` to appear on the left side of 
this `|>` token and the syntax of `operatorPipeRightSide` on the right side 
(which in this PR is limited to only a `selectClause`).
   
   ANTLR grammar allows left-recursive rules wherein any alternative may begin 
with a reference to the same rule, so the `queryTerm` on the left side may 
match any valid existing syntax for a `queryTerm` such as `TABLE t`, a table 
subquery, etc. Since we are extending `queryTerm` to also match against 
`queryTerm OPERATOR_PIPE operatorPipeRightSide`, this alternative implements 
the recursion wherein we may chain multiple pipe operators together. For 
example, in `TABLE t |> SELECT x |> LIMIT 2`, `TABLE t` matches a `queryTerm`, 
then `TABLE t |> SELECT x` matches another, and finally the entire query (using 
the new recursive `#operatorPipeStatement` alternative  two times).
   
   Otherwise, if the rule does not match, then the parser moves on to try the 
next rule in the sequence, and so on, similar to a Scala pattern-match. This 
defines the precedence of the rules amongst each other: the ones appearing 
first in the list in `SqlBaseParser.g4` apply first.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to