Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19547 )
Change subject: IMPALA-11950: Planner change for Iceberg metadata querying ...................................................................... Patch Set 1: (17 comments) Thank you for the reviews, updated the patch. http://gerrit.cloudera.org:8080/#/c/19547/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19547/1//COMMIT_MSG@9 PS1, Line 9: , this > Nit: should start a new sentence here. Done http://gerrit.cloudera.org:8080/#/c/19547/1//COMMIT_MSG@11 PS1, Line 11: writing the backend : part > Is it going to be in another patch or this one? If another one, can you ind Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/Path.java@510 PS1, Line 510: org.apache.hadoop.hive.metastore.api.Table msTable = rootTable_.getMetaStoreTable(); > Do you use this 'msTable' anywhere? Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/TableName.java File fe/src/main/java/org/apache/impala/analysis/TableName.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/analysis/TableName.java@59 PS1, Line 59: Preconditions.checkNotNull(tbl); > Is it allowed to be empty? During candidate table creation this constructor is used, so empty table name is allowed here. http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@52 PS1, Line 52: metadataTableName_ = tblRefPath.get(2); > Can there be interference with for example CollectionTableRefs which may ha This is only called when isIcebergMetadataTable() returns true, so only for iceberg tables and when the third part of the table name matches the Iceberg metadtat table name. http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@332 PS1, Line 332: RANDOM > Isn't DataPartition.UNPARTITIONED is more suitable for metadata table scan It is, nice catch. While I was testing I wanted to see some more exciting plans, this removes some exchange nodes. http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@36 PS1, Line 36: desc_ = desc; > Do we allow a null desc_? If not, we could have a precondition check. I did not see that for HdfsScanNode/HbaseScanNode so I would keep it this way. I assume it cannot be null. http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@47 PS1, Line 47: Preconditions.checkNotNull(desc_.getPath()); > If desc_ can be null (see comment on L36), we should also check that it is Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@73 PS1, Line 73: > nit: trailing spaces Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2223 PS1, Line 2223: Metadta > typo Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/service/Frontend.java@2217 PS1, Line 2217: t > Nit: extra t. Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/main/java/org/apache/impala/service/Frontend.java@2220 PS1, Line 2220: t > Nit: extra t. Done http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19547/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1288 PS1, Line 1288: metadtat > nit: typo Done http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test: http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@1 PS1, Line 1: explain SELECT * FROM functional_parquet.iceberg_alltypes_part_orc.history > Just a general comment to have coverage for most of the metadata tables her I added coverage on parser side, do you think it would be useful to have plans for all the tables as well? Also, added a plan with a nested type. http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@6 PS1, Line 6: row-size=33B > It might be a valid use case to do some projection of the columns and check Added a new test with projection. http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@7 PS1, Line 7: ---- DISTRIBUTEDPLAN > This line just made me wondering if it makes any sense to run a metadata qu Yes, the fragment will be executed on one node. I don't think it makes sense at this point to split the scanning. http://gerrit.cloudera.org:8080/#/c/19547/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test@29 PS1, Line 29: row-size=0B > Is it correct that the row-size for the scans here are zero while the join Nice catch, this is the result of the planning, missed to calculate the stats for the nodes. Fixed it. -- To view, visit http://gerrit.cloudera.org:8080/19547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3675d7a57ca570bfec306798589b5ef6aa34b5c6 Gerrit-Change-Number: 19547 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 10 Mar 2023 16:47:49 +0000 Gerrit-HasComments: Yes
