Arnab Karmakar has uploaded this change for review. (
http://gerrit.cloudera.org:8080/24505
Change subject: IMPALA-15083: [Part 1] Add Parquet UUID read support for
Iceberg tables
......................................................................
IMPALA-15083: [Part 1] Add Parquet UUID read support for Iceberg tables
This patch enables Impala to read UUID columns from Iceberg tables
stored in Parquet format. Parquet stores UUID as `FIXED_LEN_BYTE_ARRAY(16)`
and is annotated with the UUID logical type metadata.
Impala reads the 16-byte UUID value from Parquet and converts it to a
36-character canonical string representation (8-4-4-4-12 format) at
query time. This approach provides readable output while maintaining
compatibility with standard UUID string formats.
This commit enables full read support for Parquet-based Iceberg
UUID columns, including filtering, aggregation, sorting, and joins.
Testing:
- Added comprehensive query test suite (iceberg-uuid-type.test) covering:
* Basic SELECT queries with UUID columns
* WHERE clause predicates (equality, comparison, IN, BETWEEN)
* JOIN operations with UUID columns
* Aggregate functions (COUNT, MIN, MAX, NDV, GROUP BY) and negative
tests for unsupported SUM/AVG
* Iceberg V3 initial-default backfill for UUID columns
* Iceberg bucket(4, uuid_col) partition transform pruning
* EXPLAIN plans verifying predicate pushdown and partition pruning
* Negative tests confirming ORC/Avro formats are rejected with clear errors
- Test data includes four Iceberg table variants (Parquet with/without
partitioning, ORC, Avro) generated via Iceberg Java API
Change-Id: I4157c002e80677d27d8fd060c7bfa07b95d7c78f
Assisted-by: Composer 2.5
---
M be/src/exec/parquet/parquet-column-chunk-reader.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-data-converter.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/cast-functions.h
A be/src/util/uuid-util.h
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/data/00000-0-4506acbe-4b67-470a-a55e-b10babd0d936-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/data/00000-0-6a690319-54bc-488d-ad09-72e0728adcf1-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/data/00000-0-87520c0c-721e-47c2-a13e-c532a6d5506d-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/data/00000-0-a1b65880-dbb4-4723-8a62-38ab685f57a5-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/data/00000-0-c8b87271-588a-450b-b893-fe3503b9e0e0-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/5dbfe621-ea8b-42ef-bf71-22924ebe9a4c-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/6bab8911-ac05-48c9-a5f5-840b2f5b2158-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/a524d5c5-eb17-492a-b20a-aef9dfaf77eb-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/a8a0fea1-9d07-4af1-a580-c28fe74d58a9-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/f3bde471-ff6e-433a-b184-71564eac14f2-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/snap-1854301121202970963-1-6bab8911-ac05-48c9-a5f5-840b2f5b2158.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/snap-4744360932271692267-1-a8a0fea1-9d07-4af1-a580-c28fe74d58a9.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/snap-4979207191920000664-1-5dbfe621-ea8b-42ef-bf71-22924ebe9a4c.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/snap-5570936655938254077-1-f3bde471-ff6e-433a-b184-71564eac14f2.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/snap-6711875576164514304-1-a524d5c5-eb17-492a-b20a-aef9dfaf77eb.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v5.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v6.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/v7.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test/metadata/version-hint.text
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/data/00000-0-36a10355-4da3-48cc-baa7-78a818130e15-1-00001.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/data/00000-0-4e7c2ed8-d3ce-484c-8bc5-d0bd095c46d1-1-00001.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/data/00000-0-e1248e53-2a33-434c-ba9e-4b0cde09f23a-1-00001.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/data/00000-0-e290bcce-35c3-48d5-b592-da066823f124-1-00001.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/data/00000-0-e6f6ea8e-0241-44c6-b214-1aa107927521-1-00001.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/2b500dae-2f17-4139-a3ab-749bf7d4d2b4-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/2f0ca499-85fc-4cf7-a4d7-abf2f4a18a46-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/c0ce300a-9874-4199-aa3a-abcae5b02ef7-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/d1bbde7d-1667-454f-b354-5661c08c8e94-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/d2878e39-671e-434e-a1cf-e6f49bb218b5-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/snap-1257189622857586261-1-d2878e39-671e-434e-a1cf-e6f49bb218b5.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/snap-5325065939761959235-1-2b500dae-2f17-4139-a3ab-749bf7d4d2b4.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/snap-5960700658108006852-1-c0ce300a-9874-4199-aa3a-abcae5b02ef7.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/snap-6779792615814615855-1-d1bbde7d-1667-454f-b354-5661c08c8e94.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/snap-9054099040941058343-1-2f0ca499-85fc-4cf7-a4d7-abf2f4a18a46.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v5.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/v6.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_avro/metadata/version-hint.text
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/data/uuid_col_bucket=1/00000-0-c962e3c2-e416-4ef6-84b6-1fa27a07e13a-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/data/uuid_col_bucket=2/00000-0-24b210a8-0071-4402-9fcb-3dd3ce87e9b1-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/data/uuid_col_bucket=2/00000-0-4283dd33-dfee-423c-9a01-648760d6b2c5-1-00001.parquet
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/06eaa4cc-9cfa-4bb3-b409-13be3c6fc19e-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/63e4ba94-0cdf-44cc-84f7-ddf48a5bfa72-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/f60a3796-9b12-4584-8447-69a2aa2564ff-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/snap-1431857633119099754-1-63e4ba94-0cdf-44cc-84f7-ddf48a5bfa72.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/snap-1467529946775558038-1-06eaa4cc-9cfa-4bb3-b409-13be3c6fc19e.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/snap-6074002651835100957-1-f60a3796-9b12-4584-8447-69a2aa2564ff.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_bucket_part/metadata/version-hint.text
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/data/00000-0-2e0fbfe3-d923-4a27-b5d4-f941a82fb1d1-1-00001.orc
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/data/00000-0-896e60f8-2a60-47e3-8187-8f8bed145b7f-1-00001.orc
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/data/00000-0-b9172fb6-1d4c-46e0-817c-b3d6377ccf31-1-00001.orc
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/data/00000-0-d7da9c66-c7e5-45e0-a69a-0c2eb82e96c0-1-00001.orc
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/data/00000-0-eef53fbc-bade-487e-b510-537354cf46e0-1-00001.orc
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/22fa5c4b-83a3-46b2-bafa-3553189862bd-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/4abe1f4e-1220-4b05-a5e6-ebfc8fb92df3-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/547a17ba-a0d2-4921-956b-e06fbdadb60b-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/75312a76-0d22-45dc-80c7-41c2cb712f98-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/d570e583-e57a-4879-8a99-052c9a85d706-m0.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/snap-5116163803842658326-1-4abe1f4e-1220-4b05-a5e6-ebfc8fb92df3.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/snap-5347252638781720483-1-d570e583-e57a-4879-8a99-052c9a85d706.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/snap-5748673676392974692-1-75312a76-0d22-45dc-80c7-41c2cb712f98.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/snap-6680088367048559051-1-22fa5c4b-83a3-46b2-bafa-3553189862bd.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/snap-7102485323114575692-1-547a17ba-a0d2-4921-956b-e06fbdadb60b.avro
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v1.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v2.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v3.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v4.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v5.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/v6.metadata.json
A
testdata/data/iceberg_test/iceberg_uuid/iceberg_uuid_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-uuid-type.test
M tests/query_test/test_iceberg.py
104 files changed, 976 insertions(+), 53 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/24505/1
--
To view, visit http://gerrit.cloudera.org:8080/24505
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4157c002e80677d27d8fd060c7bfa07b95d7c78f
Gerrit-Change-Number: 24505
Gerrit-PatchSet: 1
Gerrit-Owner: Arnab Karmakar <[email protected]>