Daniel Becker has uploaded this change for review. (
http://gerrit.cloudera.org:8080/21269
Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested
in complex types in select list
......................................................................
IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types
in select list
Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.
Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.
This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.
Testing:
- added test queries in iceberg-metadata-tables.test referencing both
nested and top level binary fields; also updated existing queries
- moved relevant tests (queries extracting binary fields from within
complex types) from nested-types-scanner-basic.test to a new
binary-in-complex-type.test file and also added a query that selects
the containing complex types; this new test file is run from
test_scanners.py::TestBinaryInComplexType::\
test_binary_in_complex_type
- moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
positive tests (expecting success); a negative test already in
AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted
Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/00000-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-00001.parquet
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 438 insertions(+), 150 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/1
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <[email protected]>