Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 )
Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1862 PS6, Line 1862: fetchPartitionsByName > nit. This method probably should be renamed to fetchPartitionsByTable() due I think the name is ok since the 'ByName' refers to the names of the partitions rather than the table. There's a corresponding underlying HMS api called get_partitions_by_name() so this is a higher level wrapper for that. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@208 PS6, Line 208: LOG.info("Fetched {}/{} partitions for table {}.{}", numDone, partNames.size(), : msTbl.getDbName(), msTbl.getTableName()); > This probably will log <maxPartitionsPerRpc_> number of progressive message Actually, the number of logged messages would be partNames.size()/maxPartitionsPerRpc_ . If there are 5000 partitions to fetch and the batch size per RPC is 1000 partitions, then there will be 5 such messages. If the batch size is too small, yes it will generate more messages but in general it is not a bad idea to track the progress. I am ok either way but will let Quanlong respond. http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py File tests/custom_cluster/test_wide_table_operations.py: http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@36 PS6, Line 36: if cls.exploration_strategy() != 'exhaustive': This is a good test case and given the high runtime it makes sense to restrict this to exhaustive run only. I am wondering if there is any intermediate message that should be displayed to the console ..e.g after each of the recover_stmt, invalidate_stmt, show_parts_stmt execute, a success message could be shown. Also, how long does table creation itself take ? Developers may want to know if things are stuck or progressing. However, this is a nice to have, not must do. -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 29 Dec 2022 20:39:11 +0000 Gerrit-HasComments: Yes
