Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 )
Change subject: IMPALA-9741: Support querying Iceberg table by impala ...................................................................... Patch Set 18: (10 comments) One thing I didn't understand about the patch is the support for partition pruning. I don't see enough code here to handle all the edge cases - partition evolution, different partition transforms, etc. I'm OK with deferring optimizations to a later patch, but want to make sure that this patch is correct and that we document what the limitations are. http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@611 PS18, Line 611: // Iceberg table only has one partition spec now What does this mean? We don't support partition evolution yet? http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@615 PS18, Line 615: filedName nit: fieldName http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1889 PS18, Line 1889: public Can you make this protected instead of public, since it's only being used within this package. http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@372 PS18, Line 372: public I think this can be protected http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61 PS18, Line 61: icebergConjunts_ nit: icebergConjuncts_ http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@94 PS18, Line 94: Icber nit: Iceberg http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@123 PS18, Line 123: it.remove(); I think this predicate pushdown needs a bit more explanation. As far as I can see there are three possibilities for predicate pushdown: 1. The column is not part of any Iceberg partition expression so cannot be pushed down at all. 2. The column is part of all partition keys without any transformation (i.e. the identity transformation). https://iceberg.apache.org/spec/#partitioning - the predicate can be pushed down and does not need to be removed at all. 3. The column is part of some or all partition keys, or is transformed in some way during partitioning - we can push down the predicate to get partition pruning, but also need to evaluate it in the scan. I guess the two cases here are that there has been some partition evolution over time, and that there is some transform of the partition column. It's not clear how the code is handling #3. I think this needs to be clarified. The strict vs inclusive projection in the spec might be useful for explaining it - https://iceberg.apache.org/spec/#scan-planning http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@226 PS18, Line 226: * Returns true if this column does not belong to iceberg partition column. What does this mean in terms of the iceberg spec - https://iceberg.apache.org/spec/#partitioning. Does this mean that it only returns true if the column is part of the partition with an identity transform? What about if partition evolution has happened and it is a partition column for some but not all columns? http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/data/README@508 PS18, Line 508: <<<<<<< HEAD nit: remove this merge marker http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@3 PS18, Line 3: Icebreg nit: Iceberg -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 18 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Anonymous Coward (606) Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Mon, 10 Aug 2020 17:53:31 +0000 Gerrit-HasComments: Yes
