Zoltan Borok-Nagy has uploaded this change for review. (
http://gerrit.cloudera.org:8080/18847
Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position
delete tables
......................................................................
WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.
When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are added as 'hidden
columns' to Iceberg tables. 'Hidden column' is a new concept introduced
by this patch.
This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:
UNION
SCAN data ANTI JOIN
SCAN data SCAN deletes
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner.
TODO:
* better cardinality estimates
* handling complex types
* add tests
Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
18 files changed, 776 insertions(+), 472 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/1
--
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>