Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ......................................................................
Patch Set 5: (41 comments) Sending out the first wave. http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG Commit Message: Line 45: "KEY" is expected to be common enough that the ident style Might also want to mention that "key" is used for nested map types, so it's not desirable to make it a keyword. Line 51: on the HMS database and table name. If the database is "default", Does this last regarding "default" make the code easier or more complicated? What's the motivation for special casing it, i.e., what's the harm in including the "default" database name? http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu master(s). The given " Can we make this more consistent with out existing options? For example: -catalog_service_host -state_store_host http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 176: static List<String> toColumnNames(Collection<ColumnDef> columnDefs) { colDefs Line 184: static Map<String, ColumnDef> mapByColumnNames(Collection<ColumnDef> colDefs) { comment that colDefs is assumed have no duplicate names, or alternatively deal with that case http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 99: "does not support (%s) file format. Supported formats are: (%s)", the (%s) file format. Line 100: createStmt_.getFileFormat().toString().replace("_", ""), what's with this weird "_" replacement? Line 101: "PARQUET, TEXTFILE, KUDU")); use SUPPORTED_INSERT_FORMATS instead of hardcoding http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 80: private void setColumnDefs(List<ColumnDef> columnDefs) { nit: colDefs everywhere Line 95: Map<String, String> getTblProperties() { return tableDef_.getTblProperties(); } is there any significance to not adding public/private to some of these methods? Line 192: "Only Kudu tables can use DISTRIBUTE BY clause."); the DISTRIBUTE BY clause http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 43: * Represents the table parameters in a CREATE TABLE statement. Let's be more precise and list the clauses that this captures. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 97: // Table name in Kudu storage engine. It may not neccessarily be the same as the the Kudu storage engine Line 101: // 1. For managed tables, 'kuduTableName_' is prefixed with '__IMPALA__<db_name>' to Can we make the prefix 'impala::<db_name>' instead? Better to keep the "impala" part lowercase for consistency. It might be a minor detail, but keeping the name readable seems important. I'd imagine that Kudu tables will be accessed from several tools. Line 154: * are loaded from HMS. The function also updates the table schema in HMS. Add a comment about why we also update the HMS. My understanding is that we want to propagate alterations made to the Kudu table to the HMS. Line 195: msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); This means every invalidate/refresh will do an extra alter to HMS. Are we concerned about the extra load? Line 202: * Loads the schema from the Kudu table including column definitions and primary key mention that it loads the schema into this table as well as the HMS table Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); getIdentSql() may add backticks, I don't think we want that here Line 223: numClusteringCols_ = primaryKeyColumnNames_.size(); Check that there is at least one PK column Line 228: PartitionSchema partitionSchema = kuduTable.getPartitionSchema(); Preconditions.checkState(!colsByPos_.isEmpty()); ? Line 254: public static KuduTable createTmpTable(Db db, createCtasTarget()? Seems more explicit since there really is only one use. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1117: if (db != null) dropTablesFromKudu(db); Won't this drop all Kudu tables even if CASCADE is not specified? Line 1151: // the metadata must be fetched from the Hive Metastore. Why do we need to load from HMS? See comments below. Line 1167: throw new ImpalaRuntimeException( This seems like weird behavior. The user issued a cascading drop db, but gets an error because one of the tables could not be fetched from the HMS. It seems fine to ignore the error because the user wanted to drop everything anyway. Line 1172: if (!KuduTable.isKuduTable(msTable)) continue; Why do we need this check? Won't Kudu tell us if the table does not exist? Seems fine to ignore that error. Line 1438: StorageDescriptor sd = HiveStorageDescriptorFactory.createSd( should we consolidate all the SD stuff into one createSd() call? Line 1453: if (params.getPartition_columns() != null) { for my understanding: these are empty for Kudu tables right? http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: Line 41: //import org.apache.hadoop.hive.metastore.api.Table; ? Line 53: * such as creating and droping tables from Kudu. typo: dropping Line 58: public final static ColumnDef REQUIRED_THRIFT_COLUMN_DEF; Isn't it enough to set an empty list or mark the thrift field as set? Line 70: static void createTable(org.apache.hadoop.hive.metastore.api.Table msTbl, createManagedTable()? and then mention that msTbl must be a managed table Line 85: kudu.createTable(kuduTableName, schema, tableOpts); May this throw if the table already exists? If so it might be cleaner to rely on that exception for checking whether the table already exists. Right now there's a possible race because tableExists() and createTable() are not atomic (i.e., kudu could still say the table already exists here) Line 97: throws ImpalaRuntimeException { remove throws? Line 121: TCreateTableParams params, Schema schema) throws ImpalaRuntimeException { remove throws? Line 155: int r = Integer.parseInt(replication); catch NumberFormatException? Line 170: if (Table.isExternalTable(msTbl)) return; Can we make this a Precondition and have the caller check it? Line 178: if (kudu.tableExists(tableName)) { same comment about non-atomic exists() + drop() calls Line 214: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); I don't think we want getIdentSql() here Line 239: } catch (Exception e) { I think we should be more nuanced in our exception handling here, in particular, also include the original exception as the cause. For example, if for some reason we cannot connect to Kudu we will return a "table does not exist" without even including the original cause (presumably an IOE). http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: Line 198: metastoreTableName : "__IMPALA__" + metastoreDbName + "." + metastoreTableName; make the prefix a static constant (and change it to "impalad::" please) Line 240: case BOOL: return Type.BOOLEAN; no DECIMAL? -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-HasComments: Yes
