Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 )
Change subject: IMPALA-11812: Deduplicate column list in hmsPartitions ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG@15 PS3, Line 15: interned. nit: comma instead of period http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1842 PS3, Line 1842: msTbl.getPartitionKeys().size() On line 1828 this value was stored as numClusteringCols_ so we could use that. http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1877 PS3, Line 1877: dereference nit: did you mean deduplicate instead of dereference ? http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@210 PS3, Line 210: cols = deduplicateColumnList(partitions, cols); I am wondering why we would pass in a null cols list to begin with. Could we use the table level columns similar to what was done for addHmsPartitions() (assuming the table is not an incomplete table) ? Table tbl = catalog_.getTable(dbName, tblName); cols = tbl.getMetaStoreTable().getSd().getCols() http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@224 PS3, Line 224: public static List<FieldSchema> deduplicateColumnList(List<Partition> partitions, Since this method is actually force-setting the columns list for the partition, I think 'deduplicate' is not the best name. It gives the impression that a duplicate elimination is being done. Also, same comment as earlier .. what are the situations where we expect a null cols list ? Are there cases where we don't have the table level schema and instead have to use the first partition's schema ? http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@229 PS3, Line 229: p.getSd().getCols() Could this also return null ? If so, given a set of partitions, we would have first N partitions set to null and once the first non-null is found then the rest of the partitions would be set to that. If we want all of them to be set to non-null, we could do 2 passes. -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Tue, 27 Dec 2022 16:18:28 +0000 Gerrit-HasComments: Yes
