Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23536 )
Change subject: IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions ...................................................................... Patch Set 10: (7 comments) http://gerrit.cloudera.org:8080/#/c/23536/10/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/23536/10/common/thrift/ImpalaService.thrift@1075 PS10, Line 1075: PARTITION_LIMIT = 198 The name is too generic that it seems a limit on all statements, including SHOW PARTITIONS and SELECT statements. Might be more clear if we use names like SHOW_CREATE_TABLE_PARTITION_LIMIT. http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@180 PS10, Line 180: removeHiddenTableProperties(createTableProps); It seems this is the only difference between 'createTableProps' and 'alterTableProps'. If so, we can refactor the code to simplify the duplicate operations on them: * move this to the end of this method and use one map before calling it * duplicate a new map for 'createTableProps' and invoke removeHiddenTableProperties() on it. http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@494 PS10, Line 494: if (rawProps.containsKey(Table.TBL_PROP_LAST_DDL_TIME)) { : rawProps.remove(Table.TBL_PROP_LAST_DDL_TIME); : } nit: we can simplify this to only use remove(). http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@658 PS10, Line 658: // Add numRows if available : long numRows = table.getNumRows(); : if (numRows < 0) { : try { : Map<String, String> params = msTbl != null ? msTbl.getParameters() : null; : String rc = params != null ? params.get(StatsSetupConst.ROW_COUNT) : null; : if (rc != null) numRows = Long.parseLong(rc); : } catch (Exception e) { : // ignore and keep numRows as unknown : } : } : if (numRows >= 0) { : allTblProps.put("numRows", Long.toString(numRows)); : } Do we still need these if "numRows" is not removed in filterTblProperties()? http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@691 PS10, Line 691: // Column stats for non-partition columns: emit a line per column including : // all applicable stats keys. : int numClusterCols = table.getNumClusteringCols(); : boolean hasColumnStats = false; : for (int i = numClusterCols; i < table.getColumns().size(); i++) { : Column c = table.getColumns().get(i); : ColumnStats s = c.getStats(); : if (s == null) continue; : boolean isFixed = c.getType() != null && c.getType().isFixedLengthType(); : : List<String> kvs = new ArrayList<>(); : // Always include NDV and nulls (may be -1 for unknown) : kvs.add("'numDVs'='" + s.getNumDistinctValues() + "'"); : kvs.add("'numNulls'='" + s.getNumNulls() + "'"); : // Include size stats only for variable-length types : if (!isFixed) { : kvs.add("'maxSize'='" + s.getMaxSize() + "'"); : double avg = s.getAvgSize(); : String avgStr = (Math.rint(avg) == avg) ? Long.toString((long) avg) : Double.toString(avg); : kvs.add("'avgSize'='" + avgStr + "'"); : } : // Include boolean-specific counts (may be -1 for non-boolean types) : kvs.add("'numTrues'='" + s.getNumTrues() + "'"); : kvs.add("'numFalses'='" + s.getNumFalses() + "'"); : : if (!hasColumnStats) { : hasColumnStats = true; : } : out.append("ALTER TABLE ") : .append(getIdentSql(table.getDb().getName())).append('.') : .append(getIdentSql(table.getName())).append(' ') : .append("SET COLUMN STATS ") : .append(getIdentSql(c.getName())).append(" (") : .append(Joiner.on(", ").join(kvs)).append(");\n"); : } : if (hasColumnStats) { : out.append("\n"); : } Can we extract these into a method which returns 'hasColumnStats'? The current getCreateTableWithStatsSql() method is too long (>200 lines). We'd better make it shorter. The following codes adding ADD PARTITION and ALTER PARTITION statements can also be extracted into separate methods. http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@735 PS10, Line 735: numClusterCols > 0 I think we can move "numClusterCols > 0" to the check on L731. "numClusterCols > 0" means partitioned tables. http://gerrit.cloudera.org:8080/#/c/23536/10/tests/metadata/test_show_create_table.py File tests/metadata/test_show_create_table.py: http://gerrit.cloudera.org:8080/#/c/23536/10/tests/metadata/test_show_create_table.py@261 PS10, Line 261: """ Extract properties from partition-level SET TBLPROPERTIES statements. : Handles statements like: : ALTER TABLE ... PARTITION (p=1) SET TBLPROPERTIES ('key'='value', ...) : The regular properties_map_regex fails here because of nested parentheses. I think the problem is due to missing a space between "TBLPROPERTIES" and "(" in the ALTER PARTITION statement: https://gerrit.cloudera.org/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java#834 Replacing the space after "%s" with "\s*" in properties_map_regex() can solve the problem. We should change it to raw string by the way. def properties_map_regex(name): - return "%s \\(([^)]+)\\)" % name + return r"%s\s*\(([^)]+)\)" % name -- To view, visit http://gerrit.cloudera.org:8080/23536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Gerrit-Change-Number: 23536 Gerrit-PatchSet: 10 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 05 Nov 2025 01:51:47 +0000 Gerrit-HasComments: Yes
