wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 )
Change subject: IMPALA-9741: Support querying Iceberg table by impala ...................................................................... Patch Set 21: (7 comments) http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@86 PS20, Line 86: partitioned = true; > nit: probably just call this variable 'partitioned', having a negation in a Done http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@134 PS20, Line 134: // LocalFsTable transformed from iceberg table only has one partition > nit: then maybe we could use a Preconditions check to verify this. Also, we Done http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@59 PS20, Line 59: icebergConjuncts > icebergConjuncts_ Done http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@60 PS20, Line 60: icebergPredicates > icebergPredicates_ Done http://gerrit.cloudera.org:8080/#/c/16143/20/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@117 PS20, Line 117: 1 > In case 1 we cannot filter data files because the column is not part of the We can filter data files even column is not part of the partition expressions in some situation. Iceberg will reserve some data files' statistics in it's metadata, so we can use these non-partition column predicates to filter data files, just like impala use parquet file statistics to skip unnecessary file scan. http://gerrit.cloudera.org:8080/#/c/16143/20/testdata/data/iceberg_test/iceberg_non_partitioned/metadata/v1.metadata.json File testdata/data/iceberg_test/iceberg_non_partitioned/metadata/v1.metadata.json: PS20: > So there's a problem with these test tables. They have hdfs://localhost:205 Hi Tim, do you mean that I need to use '$DEFAULT_FS' to replace 'hdfs://localhost:20500' in these metadata files, and then replace '$DEFAULT_FS' by real file system before data loading? If so, would you please give me some advice, such as how to get real file system, which script to be modified is better and so on. Thanks a lot! http://gerrit.cloudera.org:8080/#/c/16143/20/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16143/20/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@4 PS20, Line 4: SELECT count(*) from ic > nit: it can be omitted Done -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 21 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Anonymous Coward (606) Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Fri, 28 Aug 2020 13:31:46 +0000 Gerrit-HasComments: Yes
