Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19547 )
Change subject: IMPALA-11950: Planner change for Iceberg metadata querying ...................................................................... Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/Path.java@510 PS1, Line 510: org.apache.hadoop.hive.metastore.api.Table msTable = rootTable_.getMetaStoreTable(); Do you use this 'msTable' anywhere? http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@332 PS1, Line 332: RANDOM Isn't DataPartition.UNPARTITIONED is more suitable for metadata table scan nodes? http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@73 PS1, Line 73: nit: trailing spaces http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2223 PS1, Line 2223: Metadta typo http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1288 PS1, Line 1288: metadtat nit: typo http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test: http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@1 PS1, Line 1: explain SELECT * FROM functional_parquet.iceberg_alltypes_part_orc.history Just a general comment to have coverage for most of the metadata tables here. Additionally, experimenting with nested columns and how they are actually represented as tuple/slotdescriptors and how they affect the row-size would be also beneficial here, in my opinion. http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@6 PS1, Line 6: row-size=33B It might be a valid use case to do some projection of the columns and check if the row-size decreases accordingly. Additionally, if we add the same column multiple times in the select list, the row-size shouldn't grow as we can do a single read for the redundant columns. http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@7 PS1, Line 7: ---- DISTRIBUTEDPLAN This line just made me wondering if it makes any sense to run a metadata query scan in a distributed way. I guess the whole query would be executed on one node, right? http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@29 PS1, Line 29: row-size=0B Is it correct that the row-size for the scans here are zero while the join has a row-size of 66? I'd expect that the scans would have a row size of 33, otherwise there won't be any space in the row batch to populate the scan results to. I'd check what slots are materialized on these nodes. -- To view, visit http://gerrit.cloudera.org:8080/19547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3675d7a57ca570bfec306798589b5ef6aa34b5c6 Gerrit-Change-Number: 19547 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 02 Mar 2023 14:35:23 +0000 Gerrit-HasComments: Yes
