Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Patch Set 5:


Sending out the first wave.
Commit Message:

Line 45:    "KEY" is expected to be common enough that the ident style
Might also want to mention that "key" is used for nested map types, so it's not 
desirable to make it a keyword.

Line 51:    on the HMS database and table name. If the database is "default",
Does this last regarding "default" make the code easier or more complicated? 
What's the motivation for special casing it, i.e., what's the harm in including 
the "default" database name?
File be/src/service/

Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu 
master(s). The given "
Can we make this more consistent with out existing options? For example:

File fe/src/main/java/org/apache/impala/analysis/

Line 176:   static List<String> toColumnNames(Collection<ColumnDef> columnDefs) 

Line 184:   static Map<String, ColumnDef> 
mapByColumnNames(Collection<ColumnDef> colDefs) {
comment that colDefs is assumed have no duplicate names, or alternatively deal 
with that case
File fe/src/main/java/org/apache/impala/analysis/

Line 99:           "does not support (%s) file format. Supported formats are: 
the (%s) file format.

Line 100:           createStmt_.getFileFormat().toString().replace("_", ""),
what's with this weird "_" replacement?

Line 101:           "PARQUET, TEXTFILE, KUDU"));
use SUPPORTED_INSERT_FORMATS instead of hardcoding
File fe/src/main/java/org/apache/impala/analysis/

Line 80:   private void setColumnDefs(List<ColumnDef> columnDefs) {
nit: colDefs everywhere

Line 95:   Map<String, String> getTblProperties() { return 
tableDef_.getTblProperties(); }
is there any significance to not adding public/private to some of these methods?

Line 192:           "Only Kudu tables can use DISTRIBUTE BY clause.");
the DISTRIBUTE BY clause
File fe/src/main/java/org/apache/impala/analysis/

Line 43:  * Represents the table parameters in a CREATE TABLE statement.
Let's be more precise and list the clauses that this captures.
File fe/src/main/java/org/apache/impala/catalog/

Line 97:   // Table name in Kudu storage engine. It may not neccessarily be the 
same as the
the Kudu storage engine

Line 101:   // 1. For managed tables, 'kuduTableName_' is prefixed with 
'__IMPALA__<db_name>' to
Can we make the prefix 'impala::<db_name>' instead?

Better to keep the "impala" part lowercase for consistency.

It might be a minor detail, but keeping the name readable seems important. I'd 
imagine that Kudu tables will be accessed from several tools.

Line 154:    * are loaded from HMS. The function also updates the table schema 
in HMS.
Add a comment about why we also update the HMS. My understanding is that we 
want to propagate alterations made to the Kudu table to the HMS.

Line 195:       msClient.alter_table(msTable_.getDbName(), 
msTable_.getTableName(), msTable_);
This means every invalidate/refresh will do an extra alter to HMS. Are we 
concerned about the extra load?

Line 202:    * Loads the schema from the Kudu table including column 
definitions and primary key
mention that it loads the schema into this table as well as the HMS table

Line 216:       String colName = ToSqlUtils.getIdentSql(colSchema.getName());
getIdentSql() may add backticks, I don't think we want that here

Line 223:     numClusteringCols_ = primaryKeyColumnNames_.size();
Check that there is at least one PK column

Line 228:     PartitionSchema partitionSchema = kuduTable.getPartitionSchema();
Preconditions.checkState(!colsByPos_.isEmpty()); ?

Line 254:   public static KuduTable createTmpTable(Db db,
createCtasTarget()? Seems more explicit since there really is only one use.
File fe/src/main/java/org/apache/impala/service/

Line 1117:     if (db != null) dropTablesFromKudu(db);
Won't this drop all Kudu tables even if CASCADE is not specified?

Line 1151:     // the metadata must be fetched from the Hive Metastore.
Why do we need to load from HMS? See comments below.

Line 1167:         throw new ImpalaRuntimeException(
This seems like weird behavior. The user issued a cascading drop db, but gets 
an error because one of the tables could not be fetched from the HMS. It seems 
fine to ignore the error because the user wanted to drop everything anyway.

Line 1172:       if (!KuduTable.isKuduTable(msTable)) continue;
Why do we need this check? Won't Kudu tell us if the table does not exist? 
Seems fine to ignore that error.

Line 1438:     StorageDescriptor sd = HiveStorageDescriptorFactory.createSd(
should we consolidate all the SD stuff into one createSd() call?

Line 1453:     if (params.getPartition_columns() != null) {
for my understanding: these are empty for Kudu tables right?
File fe/src/main/java/org/apache/impala/service/

Line 41: //import org.apache.hadoop.hive.metastore.api.Table;

Line 53:  * such as creating and droping tables from Kudu.
typo: dropping

Line 58:   public final static ColumnDef REQUIRED_THRIFT_COLUMN_DEF;
Isn't it enough to set an empty list or mark the thrift field as set?

Line 70:   static void createTable(org.apache.hadoop.hive.metastore.api.Table 
createManagedTable()? and then mention that msTbl must be a managed table

Line 85:       kudu.createTable(kuduTableName, schema, tableOpts);
May this throw if the table already exists? If so it might be cleaner to rely 
on that exception for checking whether the table already exists. Right now 
there's a possible race because tableExists() and createTable() are not atomic 
(i.e., kudu could still say the table already exists here)

Line 97:       throws ImpalaRuntimeException {
remove throws?

Line 121:       TCreateTableParams params, Schema schema) throws 
ImpalaRuntimeException {
remove throws?

Line 155:       int r = Integer.parseInt(replication);
catch NumberFormatException?

Line 170:     if (Table.isExternalTable(msTbl)) return;
Can we make this a Precondition and have the caller check it?

Line 178:       if (kudu.tableExists(tableName)) {
same comment about non-atomic exists() + drop() calls

Line 214:         String colName = ToSqlUtils.getIdentSql(colSchema.getName());
I don't think we want getIdentSql() here

Line 239:     } catch (Exception e) {
I think we should be more nuanced in our exception handling here, in 
particular, also include the original exception as the cause. For example, if 
for some reason we cannot connect to Kudu we will return a "table does not 
exist" without even including the original cause (presumably an IOE).
File fe/src/main/java/org/apache/impala/util/

Line 198:         metastoreTableName : "__IMPALA__" + metastoreDbName + "." + 
make the prefix a static constant (and change it to "impalad::" please)

Line 240:       case BOOL: return Type.BOOLEAN;

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-HasComments: Yes

Reply via email to