Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ......................................................................
Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list<string> columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list<TRangeLiteral> values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list<string> columns : 2: optional list<TRangeLiteralList> split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } > We store the distribute by params in the KuduTable class which is then seri I don't see why this needs to be in the TKuduTable though. I think we just need this in the TCreateTableParams. I wouldn't mind except it's more work to reason about when it's set/not set in a TKuduTable. PS3, Line 390: : // Distribution schemes : 4: required list<TDistributeParam> distribute_by > See my comment above. We do load and serialize the distribute by params in I don't see why this needs to be in TKuduTable, only TCreateTableParams. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: PS3, Line 156: if (splitRows_ == null) { : builder.append("..."); > This is possible because an object of DistributeParam class is created in t In case #2 why do we even bother creating a DistributeParams? The partitioning/distribution info only really makes sense at the time of table creation I think, right? This goes back to our conversation (in person) about how it'd probably make sense to separate the partitioning part of the command from CREATE TABLE, and do partitioning more like how we would for non-Kudu tables, i.e. separate commands. Obviously that hasn't been worked through and realistically we probably won't get to that in time. If we don't get to something like that, I think the only reason we'd want #2 (in terms of loading DistributeParams) is SHOW CREATE TABLE, so this class only needs to support toSql() in case #2. Is that right? Can you add a comment in the class header or the relevant constructor where we might get case #2 so it's easier to reason about? A TODO / JIRA ref about resolving the '...' would be good. It might be good to add a test case with this output if there isn't one already to make sure we fix it properly and/or it doesn't change in some other unexpected way. PS3, Line 191: if (splitRows_ == null) { : result.setBy_range_param(rangeParam); : return result; : } > See explanation above. I guess it's confusing to me because TDistributeParam doesn't seem to be necessary in case #2. In case #1 it gets added to a TCreateTableStmt object. In case #2 it gets added to a TKuduTable object, but it seems unnecessary given that no other backend or even the catalog care about the distribute params - right? If that's true, I think we could avoid storing this in KuduTable, then avoid calling this in KuduTable.getTKuduTable(), and then throwing here if splitRows_ is null since I don't think this should be called in this case. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS3, Line 126: return fullyQualifiedTableName_ != null ? fullyQualifiedTableName_ : tableName_; > We do call this function before tableName_ has been analyzed. One example i ah ok, can you add a comment? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: PS3, Line 243: List<String> primaryKeysSql, String kuduDistributeByParams, can you update the comment, especially around which of these should/can be null vs. empty. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
