[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Matthew Jacobs (Code Review) Fri, 16 Sep 2016 09:27:37 -0700

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................



Patch Set 1:

(17 comments)

another batch of comments, still a lot of files i haven't touched yet...

http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 47:     {"<init>", "(ZILjava/lang/String;IIZLjava/lang/String;)V",
            :       &catalog_ctor_},
1 line


http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove 
it?
            : DEFINE_bool(load_catalog_at_startup, false, "if true, load all 
catalog data at startup");
agreed. doesn't look like it's used by JniFrontend. Let's remove it.


PS1, Line 63: IPs
IP addresses.


PS1, Line 63:  
no space


http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 52: THdfsFileFormat
Maybe not for this patch since it's already huge, but it'd be great to 
generalize this if we can. I can think of two improvements:
1) Maybe we should model the storage layer, e.g. have a TStorageEngine, then 
make this TFileFormat (perhaps). This is probably a big change.
2) Rename this to be TStorageFormat, which kind of addresses #1 but doesn't 
separate out storage engines and file formats.


PS1, Line 61: THdfsCompression
similarly, this seems unnecessarily specific to Hdfs. Not necessarily something 
to change now but maybe we can create a follow-up JIRA to clean this up.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 31: ColumnDefOptions
this class doesn't exist, please add the missing file


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

PS1, Line 84: BigDecimal
why BigDecimal? Ultimately this has to resolve to some int for kudu's API. (We 
can check if it's 32 or 64bit).


PS1, Line 110: <= 1)
is 1 bucket actually not valid?


PS1, Line 138: colType.isStringType() && !exprType.isStringType()
             :               || colType.isIntegerType() && 
(!exprType.isIntegerType()
             :                   || exprType.getPrecision() > 
colType.getPrecision())
1. I don't see anything in the Kudu client that explicitly says you can't 
partition on any particular types. This code will exclude boolean and floating 
pt types, which is maybe unnecessary. 
2. Esp if we can address #1, is there a cleaner way to make sure the types are 
valid rather than enumerating the kinds of types to consider? I'm not sure, but 
maybe one of the frontend gurus can think of something.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java:

PS1, Line 39: HdfsFileFormat
Can you open a JIRA (and leave it in the comment) to refactor this later?


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 79: 
            :   // TODO we should have something like 
KuduConfig.getDefaultConfig()
do you know what this means / can it be removed now that we're adding default 
master addrs?


Line 92:   public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by";
comment


Line 153:    * Load the columns from the schema list
can you add a comment about error handling in this function?


PS1, Line 158:       LOG.error(String.format("Kudu tables must have at least 
one"
             :           + "key column (had %d), and no more key columns than 
there are table columns "
             :           + "(had %d).", keyColumns.size(), schema.size()));
shouldn't this still fail? if not, can you add a comment why this continues?


PS1, Line 184:       LOG.error(String.format("Some key columns were not found 
in"
             :               + " the set of columns. List of column names: %s, 
List of key column names:"
             :               + " %s", Iterables.toString(columnNames), 
Iterables.toString(keyColumns)));
why do we continue?


PS1, Line 199:     // Get the table metadata from Kudu
             :     if (reuseMetadata) {
I'm confused about this. It's not clear to me from the name 'reuseMetadata' why 
this means we should populate the metadata from Kudu. If anything, it sounds 
like it would be the opposite.

The base class comment just says "If 'reuseMetadata' is true, reuse valid 
existing metadata.".


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Reply via email to