Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Patch Set 1:


Ok, here's a bit to get started. Just made my way through most of analysis. 
Still have a lot to get through but figured I'll give you the feedback I have 
while I do a few other things.
Commit Message:

PS1, Line 21: When attempting to create an external
            :   table "foo" in database "bar", Impala will search for a Kudu 
            :   name "" and "bar" (Kudu doesn't have database name spaces
            :   yet.)
I think this sentence is duplicated by #10. Maybe remove this part. This item 
is about reading the columns, right? The issue of table name resolution is 
separate IMO, and covered by #10, though consider moving them next together :)

PS1, Line 25:  The Kudu table is now required to exist at the time of creation 
            :   Impala.
for external tables? or if this is more general can you explain a bit more?

PS1, Line 31: default value
can it still be overridden?

PS1, Line 39: Eventually Hive should have the needed
            :   functionality and the Kudu delegate (renamed in this patch to 
            :   can be removed.
maybe, it's not clear if impala would want to keep the ability to do this on 
its own... Remove?

PS1, Line 54:  If the database is "default" then
            :   the Kudu table name will not include the database.
hm... I wonder if it'd be easier just to keep it standardized and always have 
the name, i.e. using default in this case. What do you think? I'll see if I 
change my mind after reading more code.

PS1, Line 56: 11) Several improvements in the grammar related to the family
            :   of CREATE TABLE statements.
I don't think this bullet adds much value to the commit message. Everything 
else is more user visible, this is just some necessary code cleanup.

PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new
            :   CREATE TABLE syntax.
I think this can be removed too, it should be implied there are tests and it 
takes away from the billion other "features" in this patch :)

PS1, Line 60: source of truth
Does this mean we always hit Kudu for metadata? In the Catalog? In the 
impalad's Catalog?

PS1, Line 60: insteads

PS1, Line 64: Not included in this commit:
            : - Additional column properties such as nullability and compression
            : encodings.
I don't think you need to include this, we have a separate JIRA for this.
File fe/src/main/cup/sql-parser.cup:

FYI I'm not looking at this very carefully because it was reviewed previously

Let me know if this has changed much since then.

PS1, Line 405:   view_column_def_list, view_column_defs;
separate nonterminals if they dont fit?

PS1, Line 458: key_ident;
does this not need Boolean like those below?
File fe/src/main/java/com/cloudera/impala/analysis/

I'm not in love with the TableDef and TableDefOptions stuff because it feels 
like that's parser implementation concerns bleeding into analysis code.

Do you think we need both? Can we fold the latter into TableDef?

If we have to keep both, it'd be helpful to at least group the functions in 
this file based on their source (e.g. TableDef, TDOptions, or on this itself) 
with comments around the groups? Maybe within those groups they could also be 
sorted by visibility? There's a lot of stuff going on so I think it'd be 
helpful to keep it tidy.

PS1, Line 240: 
is this needed? I thought there was going to be code that fetches the columns 
from Kudu and stores them.

PS1, Line 242:       if 
(!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) {
             :         getTblProperties().put(KuduTable.KEY_TABLE_NAME,
             :             KuduUtil.getDefaultCreateKuduTableName(getDb(), 
             :       }
hm... can we simplify this and only allow internal tables to be created with 
the name db_name.table_name? Obviously there could be conflicts and it should 
fail then. Then it'd be 1 less case to consider.

Also, I wonder if we should prefix the underlying kudu table names with 
something, e.g. IMPALA_db.table?

PS1, Line 275:         // Kudu's data distribution rules are enforced here. A 
good reference is
             :         // As a summary:
             :         // * Only primary key column can be used in distribution 
             :         // * Distributions may be either hash or ranged (value 
             :         // * Multiple distributions can be used/mixed
             :         //   * A column can only be used in one hash distribution
             :         //   * A range distribution can only be used once and 
must be the last definition
             :         //     (both enforced by the grammar).
I don't think we should get in the business of validating the Kudu rules since 
they should reject it. Is there a reason we have to do this ourselves as well?

PS1, Line 323: throwIfPrimaryKeysWereSpecified
I think the callers can throw, simpler to have a fn to check: bool 
hasPrimaryKeysSpecified(). Or maybe this isn't necessary? see my commetn below.

PS1, Line 327: isPrimaryKey
when will we have PKs from the col defs but not in the 

Better yet, it looks like the 2 callers of this fn have slightly different use 
cases so maybe it can be scrapped and handled more specifically. Feel free to 

PS1, Line 327:         if (colDef.isPrimaryKey()) {
             :           hasKey = true;
             :         }

PS1, Line 335: throwIfNotPrimaryKeyType
I only see this called once, please inline it
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 32: Represents the end of a CREATE TABLE statement. TableDef 
represents the beginning of
            :  * the statement.
I find this odd. The 'beginning/end of the statement' feels like a parser 
implementation detail.

I'm not sure why this needs to be a separate class. TableDef seems OK, and it 
already has some things that get set after it is constructed (e.g. partitions), 
so maybe these things can be moved to TableDef and set via addl functions.
File fe/src/main/java/com/cloudera/impala/analysis/

PS1, Line 207: HdfsFileFormat.KUDU
blegh this looks like an oxymoron. I wonder if we can rename this class to 
something like StorageFormat. I'll keep an eye out as I read more code.

PS1, Line 212: keyCols != null
why would this be null?

PS1, Line 217: kuduTableName != null
why would this be null?

PS1, Line 217:       if (kuduTableName != null
             :           && 
             :               table.getDb().getName(), table.getName()))) {
             :         properties.remove(KuduTable.KEY_TABLE_NAME);
             :       }
might just be cleaner to leave this in and not have to have 2 cases to test
File infra/python/deps/requirements.txt:

PS1, Line 67: # kudu-python==0.2.0
When this changes I think there may be some steps involved in "upgrading" the 
environment- have you figured out what that is? We'll need to tell people what 
command(s) to run after fetching this change.

PS1, Line 70:   # These should eventually be removed
            :   unittest2 == 1.1.0
            :     linecache2 == 1.0.0
            :     traceback2 == 1.4.0

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-HasComments: Yes

Reply via email to