Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21269


Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
......................................................................

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
     test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/00000-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 438 insertions(+), 150 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/1
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>

Reply via email to