[email protected] has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/18894 )

Change subject: IMPALA-11507: Use absolute_path when Iceberg data files outside 
of the table location
......................................................................

IMPALA-11507: Use absolute_path when Iceberg data files outside of the table 
location

For Iceberg tables, when one of the following properties is used, it is
considered that the table is possible to have data outside the table
location directory:
- 'write.object-storage.enabled' is true
- 'write.data.path' is not empty
- 'write.location-provider.impl' is configured
- 'write.object-storage.path'(Deprecated) is not empty
- 'write.folder-storage.path'(Deprecated) is not empty

We should tolerate the situation that relative path of the data files
cannot be obtained by the table location path, and we could use the
absolute path in that case. E.g. the ETL program will write the table
that the metadata of the Iceberg tables is placed in
'hdfs://nameservice_meta/warehouse/hadoop_catalog/ice_tbl/metadata',
the recent data files in
'hdfs://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', and the
data files half a year ago in
's3a://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', it
should still be queried normally by Impala.

Testing:
 - added e2e tests

Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/protobuf/planner.proto
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/42056022-e2d2-4548-9376-8993109c2ace-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/b5880d95-f4f1-49cb-ba55-143c221017fe-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/ce7ad1c8-1ad5-4391-a640-b203d7c476a4-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4264681048229339305-1-b5880d95-f4f1-49cb-ba55-143c221017fe.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-4265463682522664668-1-ce7ad1c8-1ad5-4391-a640-b203d7c476a4.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/snap-7684033746298894981-1-42056022-e2d2-4548-9376-8993109c2ace.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=0/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data/col_int=1/00001-1-5a94b6af-6ee7-4910-9bf5-165a9a4e71df-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=1/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00001.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data01/col_int=2/00001-1-7ac79643-e19f-4294-914e-7b122aff576c-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=0/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00002.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/00001-1-26bc91ef-b403-4b65-a6b0-566396b8d097-00001.parquet
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-multiple-storage-locations-table.test
M tests/query_test/test_iceberg.py
38 files changed, 1,217 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/18894/6
--
To view, visit http://gerrit.cloudera.org:8080/18894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
Gerrit-Change-Number: 18894
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to