Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23536 )

Change subject: IMPALA-13066: Extend SHOW CREATE TABLE to include stats and 
partitions
......................................................................


Patch Set 10:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23536/10/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/23536/10/common/thrift/ImpalaService.thrift@1075
PS10, Line 1075:   PARTITION_LIMIT = 198
The name is too generic that it seems a limit on all statements, including SHOW 
PARTITIONS and SELECT statements. Might be more clear if we use names like 
SHOW_CREATE_TABLE_PARTITION_LIMIT.


http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@180
PS10, Line 180:     removeHiddenTableProperties(createTableProps);
It seems this is the only difference between 'createTableProps' and 
'alterTableProps'. If so, we can refactor the code to simplify the duplicate 
operations on them:
* move this to the end of this method and use one map before calling it
* duplicate a new map for 'createTableProps' and invoke 
removeHiddenTableProperties() on it.


http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@494
PS10, Line 494:     if (rawProps.containsKey(Table.TBL_PROP_LAST_DDL_TIME)) {
              :       rawProps.remove(Table.TBL_PROP_LAST_DDL_TIME);
              :     }
nit: we can simplify this to only use remove().


http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@658
PS10, Line 658:     // Add numRows if available
              :     long numRows = table.getNumRows();
              :     if (numRows < 0) {
              :       try {
              :         Map<String, String> params = msTbl != null ? 
msTbl.getParameters() : null;
              :         String rc = params != null ? 
params.get(StatsSetupConst.ROW_COUNT) : null;
              :         if (rc != null) numRows = Long.parseLong(rc);
              :       } catch (Exception e) {
              :         // ignore and keep numRows as unknown
              :       }
              :     }
              :     if (numRows >= 0) {
              :       allTblProps.put("numRows", Long.toString(numRows));
              :     }
Do we still need these if "numRows" is not removed in filterTblProperties()?


http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@691
PS10, Line 691:     // Column stats for non-partition columns: emit a line per 
column including
              :     // all applicable stats keys.
              :     int numClusterCols = table.getNumClusteringCols();
              :     boolean hasColumnStats = false;
              :     for (int i = numClusterCols; i < table.getColumns().size(); 
i++) {
              :       Column c = table.getColumns().get(i);
              :       ColumnStats s = c.getStats();
              :       if (s == null) continue;
              :       boolean isFixed = c.getType() != null && 
c.getType().isFixedLengthType();
              :
              :       List<String> kvs = new ArrayList<>();
              :       // Always include NDV and nulls (may be -1 for unknown)
              :       kvs.add("'numDVs'='" + s.getNumDistinctValues() + "'");
              :       kvs.add("'numNulls'='" + s.getNumNulls() + "'");
              :       // Include size stats only for variable-length types
              :       if (!isFixed) {
              :         kvs.add("'maxSize'='" + s.getMaxSize() + "'");
              :         double avg = s.getAvgSize();
              :         String avgStr = (Math.rint(avg) == avg) ? 
Long.toString((long) avg) : Double.toString(avg);
              :         kvs.add("'avgSize'='" + avgStr + "'");
              :       }
              :       // Include boolean-specific counts (may be -1 for 
non-boolean types)
              :       kvs.add("'numTrues'='" + s.getNumTrues() + "'");
              :       kvs.add("'numFalses'='" + s.getNumFalses() + "'");
              :
              :       if (!hasColumnStats) {
              :         hasColumnStats = true;
              :       }
              :       out.append("ALTER TABLE ")
              :           
.append(getIdentSql(table.getDb().getName())).append('.')
              :           .append(getIdentSql(table.getName())).append(' ')
              :           .append("SET COLUMN STATS ")
              :           .append(getIdentSql(c.getName())).append(" (")
              :           .append(Joiner.on(", ").join(kvs)).append(");\n");
              :     }
              :     if (hasColumnStats) {
              :       out.append("\n");
              :     }
Can we extract these into a method which returns 'hasColumnStats'? The current 
getCreateTableWithStatsSql() method is too long   (>200 lines). We'd better 
make it shorter. The following codes adding ADD PARTITION and ALTER PARTITION 
statements can also be extracted into separate methods.


http://gerrit.cloudera.org:8080/#/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@735
PS10, Line 735: numClusterCols > 0
I think we can move "numClusterCols > 0" to the check on L731. "numClusterCols 
> 0" means partitioned tables.


http://gerrit.cloudera.org:8080/#/c/23536/10/tests/metadata/test_show_create_table.py
File tests/metadata/test_show_create_table.py:

http://gerrit.cloudera.org:8080/#/c/23536/10/tests/metadata/test_show_create_table.py@261
PS10, Line 261:     """ Extract properties from partition-level SET 
TBLPROPERTIES statements.
              :     Handles statements like:
              :     ALTER TABLE ... PARTITION (p=1) SET TBLPROPERTIES 
('key'='value', ...)
              :     The regular properties_map_regex fails here because of 
nested parentheses.
I think the problem is due to missing a space between "TBLPROPERTIES" and "(" 
in the ALTER PARTITION statement:
https://gerrit.cloudera.org/c/23536/10/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java#834

Replacing the space after "%s" with "\s*" in properties_map_regex() can solve 
the problem. We should change it to raw string by the way.

 def properties_map_regex(name):
-  return "%s \\(([^)]+)\\)" % name
+  return r"%s\s*\(([^)]+)\)" % name



--
To view, visit http://gerrit.cloudera.org:8080/23536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2
Gerrit-Change-Number: 23536
Gerrit-PatchSet: 10
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 05 Nov 2025 01:51:47 +0000
Gerrit-HasComments: Yes

Reply via email to