Hello Zoltan Borok-Nagy, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24088
to look at the new patch set (#2).
Change subject: IMPALA-14589: Add support for Iceberg V3 default values
......................................................................
IMPALA-14589: Add support for Iceberg V3 default values
Iceberg V3 introduces two types of default values to enable non-blocking schema
evolution, this patch adds support for them in Impala.
1. **initial-default**: Applied when READING old data files that were written
before a column was added to the schema. This allows adding columns with
defaults without rewriting existing data files. The default value is
materialized at read time for missing columns.
2. **write-default**: Applied when WRITING new rows that don't specify a value
for a column. This ensures consistent defaults across all engines writing to
the table, regardless of whether they're inserting via Spark, Flink, or
Impala.
How Default Values Work:
- Default values are stored in the Iceberg table's schema metadata (not in
files)
- When scanning a file missing a column, Impala checks the schema for
initial-default
- If present, the value is materialized in the template tuple before scanning
- For writes, InsertStmt checks for write-default on unmentioned columns
- Both default types are represented as JSON-serialized literals in the schema
Testing:
- Updated iceberg-v3-negative.test with schema evolution expectations
- Extended iceberg-v3-default-values.test with comprehensive coverage:
* Read path: SELECT, WHERE, aggregations, JOINs, UNION, subqueries
* Write path: INSERT with partial columns
* Time travel: Schema evolution across snapshots
Change-Id: I9f1be994a336b30b17b17819091417d777a39be9
---
M be/src/exec/avro/hdfs-avro-scanner.cc
M be/src/exec/file-metadata-utils.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-default-values.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test
M tests/query_test/test_iceberg.py
16 files changed, 463 insertions(+), 94 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/24088/2
--
To view, visit http://gerrit.cloudera.org:8080/24088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9
Gerrit-Change-Number: 24088
Gerrit-PatchSet: 2
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>