Julian Hyde created CALCITE-5208:
------------------------------------

             Summary: Reduce size of generated parser by optimizing lookahead
                 Key: CALCITE-5208
                 URL: https://issues.apache.org/jira/browse/CALCITE-5208
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


We could reduce the size of generated parser by optimizing lookahead.

The main problem is identifiers and non-reserved keywords. Every point in the 
grammar where the parser needs to choose between an identifier and something 
else the parser needs to do lookahead, so it generates a {{switch}} expression. 
But that {{switch}} contains not just the identifier tokens ({{IDENTIFIER}}, 
{{UNICODE_QUOTED_IDENTIFIER}} and 5 others) but about 300 non-reserved keywords.

Switches like this, which are typically 300 to 460 lines long, occur 53 times 
in the current parser. That's 24,000 lines of code, and the entire parser is 
38,000 lines of code. And there are several parsers (core, babel, server, test).

{noformat}
$ find . -name SqlParserImpl.java
./core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ wc 
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
   38242   86680  989569 
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ grep 'case ZONE:' 
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
 |wc
      53     106     865
$ grep 'case [A-Z]*:' 
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
 |wc
   13507   27014  255827
{noformat}

We can optimize by putting common patterns into shared methods. For example, 
this pattern occurs several times, and could be a method {{AliasOpt()}}:
{code}
( [ <AS> ] alias = SimpleIdentifier() | { alias = null; } )
{code}

It might be possible to remove a choice. Rather than choosing whether or not to 
fire a rule, we could fire the rule, call the method, and have the method 
return null.

I don't know how many of the 53 switches we can remove. If we can remove just 
10 that would be a significant number of lines of code saved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to