Tamas Mate has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/20010 )
Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying ...................................................................... IMPALA-11996: Scanner change for Iceberg metadata querying This commit adds a scan node for querying Iceberg metadata tables. The scan node creates a Java scanner object that creates and scans the metadata table. The scanner uses the Iceberg API to scan the table after that the scan node fetches the rows one by one and materialises them into RowBatches. The Iceberg row reader on the backend does the translation between Iceberg and Impala types. There is only one fragment created to query the Iceberg metadata table which is supposed to be executed on the coordinator node that already has the Iceberg table loaded. This way there is no need for further table loading on the executor side. This change will not cover nested column types, these slots are set to NULL, it will be done in IMPALA-12205. Testing: - Added e2e tests for querying metadata tables - Updated planner tests Performance testing: Created a table and inserted ~5500 rows one by one, this generated ~270000 ALL_MANIFESTS metadata table records. This table is quite wide and has a String column as well. I only mention count(*) test on ALL_MANIFESTS, because every row is materialized in every scenario currently: - Cold cache: 15.76s - IcebergApiScanTime: 124.407ms - MaterializeTupleTime: 8s368ms - Warm cache: 7.56s - IcebergApiScanTime: 3.646ms - MaterializeTupleTime: 7s477ms Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7 --- M be/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/iceberg-metadata/CMakeLists.txt A be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc A be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h A be/src/exec/iceberg-metadata/iceberg-row-reader.cc A be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/scheduling/scheduler.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impalad-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/authorization/test_ranger.py M tests/query_test/test_iceberg.py 32 files changed, 1,419 insertions(+), 167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/20010/15 -- To view, visit http://gerrit.cloudera.org:8080/20010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7 Gerrit-Change-Number: 20010 Gerrit-PatchSet: 15 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>