If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-941, "Impala
Parser issue when using fully qualified table names that start with a
number"? First, you'll want to get your development environment set up and
make sure the parse tests are passing. Follow the examples on
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
:

(pushd fe && mvn -fae test -Dtest=ParserTest)

This is running the tests in the file
fe/src/test/java/org/apache/impala/analysis/ParserTest.java. Now that you
have checked your development environment is working, add a new test to
ParserTest.java. There is an example of a statement that fails to parse in
the ticket. Given that test case, can you find a method in ParserTest.java
that should be testing this statement? If not, make a new test method
annotated with @Test and with a method name starting with "Test".

Now run the test again. It should fail and give an error message similar to
the one in the ticket. You should now be ready to fix the bug.

The lexing and parsing of SQL are performed in
fe/src/main/jflex/sql-scanner.flex and fe/src/main/cup/sql-parser.cup,
respectively. The error message indicates "Encountered: DECIMAL LITERAL".
If you run "git grep 'DECIMAL LITERAL'", you will see that this is
referenced in just sql-scanner.flex. This is because decimal literals are
parsed as a single token. In other words, in the query listed in the ticket
"INVALIDATE METADATA db.571_market", "db" is lexed as IdentifierOrKw,
".571" is lexed as a DecimalLiteral, and "_market" is lexed as
IdentifierOrKw.

To fix this, you need "db.571_market" to be lexed as the sequence
IdentifierOrKw SqlParserSymbols.DOT IdentifierOrKw. The dot will be parsed
in sql-parser.cup as table_name. In order for sql-parser.cup to be able to
do so, the lexer must not over-eagerly identify a DecimalLiteral. You can
probably achieve that by delaying the recognition of decimal literals to
the parser. Try to translate the lexer's definition of DecimalLiteral to a
definition that works in the parser.

You'll probably find the manuals for the lexer and the parser useful:

http://jflex.de/manual.html
http://www2.cs.tum.edu/projects/cup/docs.php

Have fun! Once all the tests are passing again, you can send your patch for
review following
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.

Reply via email to