Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19391 )

Change subject: IMPALA-11812: Deduplicate column list in hmsPartitions
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19391/3//COMMIT_MSG@15
PS3, Line 15: interned.
nit: comma instead of period


http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1842
PS3, Line 1842: msTbl.getPartitionKeys().size()
On line 1828 this value was stored as numClusteringCols_  so we could use that.


http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1877
PS3, Line 1877: dereference
nit: did you mean deduplicate instead of dereference ?


http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@210
PS3, Line 210:       cols = deduplicateColumnList(partitions, cols);
I am wondering why we would pass in a null cols list to begin with.  Could we 
use the table level columns similar to what was done for addHmsPartitions()  
(assuming the table is not an incomplete table) ?

   Table tbl = catalog_.getTable(dbName, tblName);
   cols = tbl.getMetaStoreTable().getSd().getCols()


http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@224
PS3, Line 224:   public static List<FieldSchema> 
deduplicateColumnList(List<Partition> partitions,
Since this method is actually force-setting the columns list for the partition, 
I think 'deduplicate' is not the best name.  It gives the impression that a 
duplicate elimination is being done.  Also, same comment as earlier .. what are 
the situations where we expect a null cols list ?  Are there cases where we 
don't have the table level schema and instead have to use the first partition's 
schema ?


http://gerrit.cloudera.org:8080/#/c/19391/3/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@229
PS3, Line 229: p.getSd().getCols()
Could this also return null ?  If so, given a set of partitions, we would have 
first N partitions set to null and once the first non-null is found then the 
rest of the partitions would be set to that.  If we want all of them to be set 
to non-null, we could do 2 passes.



--
To view, visit http://gerrit.cloudera.org:8080/19391
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d
Gerrit-Change-Number: 19391
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Tue, 27 Dec 2022 16:18:28 +0000
Gerrit-HasComments: Yes

Reply via email to