If you'd like to contribute a patch to Impala, but aren't sure what you want to work on, you can look at Impala's newbie issues: https://issues.apache.org/jira/issues/?filter=12341668. You can find detailed instructions on submitting patches at https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala. This is a walkthrough of a ticket a new contributor could take on, with hopefully enough detail to get you going but not so much to take away the fun.
How can we fix https://issues.apache.org/jira/browse/IMPALA-5614, "Add COMMENT ON syntax to support comments on all objects"? First, set up your development environment. Then launch bin/impala-shell.sh to see that the syntax, as expected, doesn't yet work: $ bin/impala-shell.sh Starting Impala Shell without Kerberos authentication Connected to localhost:21000 Server version: impalad version 2.10.0-SNAPSHOT DEBUG (build 23d79462da5d0108709e8b1399c97606f4ebdf92) *********************************************************************************** Welcome to the Impala shell. (Impala Shell v2.10.0-SNAPSHOT (23d7946) built on Thu Aug 31 23:52:28 PDT 2017) The HISTORY command lists all shell commands in chronological order. *********************************************************************************** [localhost:21000] > COMMENT ON DATABASE functional IS 'Development Database'; Query: comment ON DATABASE functional IS 'Development Database' Query submitted at: 2017-09-01 21:19:11 (Coordinator: http://jbapple-optiplex:25000) ERROR: AnalysisException: Syntax error in line 1: comment ON DATABASE functional IS 'Development Database' ^ Encountered: COMMENT Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH CAUSED BY: Exception: Syntax error The first thing you'll want to do is to change the parser to recognize statements of this form. Statements are parsed in the front end. Before we talk about that, note that Impala does use a traditional lex-then-parse method for generating the abstract syntax tree. The lexer is in JFlex, and is located in fe/src/main/jflex. The parser is in CUP and is located in fe/src/main/cup/. If you look at the lexer, you'll see that all of the keywords referenced in the ticket: COMMENT, ON, DATABASE, TABLE, COLUMN, and IS are already keywords of the language, so you won't need to alter the lexer. If you look at the parser, you'll see it's in a BNF-like format, with the top-level starting non-terminal being stmt. You'll probably want to add a new type of statement, perhaps something like comment_on_stmt. First, build the frontend to make sure you can iterate quickly on the changes you are making, using ./buildall.sh -fe_only. Now, try to copy an existing statement type to make your new COMMENT ON statement. I'd recommend starting with a single type of COMMENT ON and making sure that it works, including tests, before you do the other types. You might even want to break this up into multiple commits - first get COMMENT ON DATABASE working, tested, through code review, and committed, before doing the rest. Some places you'll want to look for places to modify or add files: fe/src/main/java/org/apache/impala/analysis contains the statement type classes for use in the front-end "analysis", which runs on the AST. fe/src/main/java/org/apache/impala/service contains Frontend.java, which can analyze a statement and turn it into a DDL request, and CatalogOpExecutor.java, which can execute operations that alter tables. For both of those directories, there is a corresponding directory in fe/src/test/java/org/apache/impala with unit tests. You'll want to add some unit tests, probably. common/thrift contains Thrift definitions for the statement types that the catalog can execute. testdata/workloads/functional-query/queries/QueryTest contains .test files for running end-to-end tests. That should hopefully be enough to get you started. Have fun!