Hello Arnab Karmakar, Peter Rozsa, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24042

to look at the new patch set (#3).

Change subject: IMPALA-14592: Read Row Lineage of Iceberg tables
......................................................................

IMPALA-14592: Read Row Lineage of Iceberg tables

Iceberg V3 added mandatory row lineage tracking for Iceberg tables.
This means each field has a row-id and a last-updated-sequence-number
associated with it. These are either stored in the data files, or can
be calculated from file metadata the following way:

* row-id: _row_id field of the record. If missing or NULL, then it
  is first-row-id of DataFile plus FILE__POSITION
* last-updated-sequence-number: _last_updated_sequence_number of the
  record. If missing of NULL, then it is the data-sequence-number of
  the DataFile.

To support Row Lineage in Impala, we introduce the concept of Hidden
Columns. Hidden Columns are columns of a table that can be stored in
the data files along with the data, but they don't participate in
'select *' expansion and they are non-modifiable. Some DBs refer to such
columns as "system columns". They are different from Virtual Columns
as Virtual Columns are not stored in the data files.

We introduce the following Hidden Columns:
* _file_row_id: BIGINT field with field id 2147483540.
* _file_last_updated_sequence_number: BIGINT field with field id
  2147483539

We also introduce the following Virtual Column:
* ICEBERG__FIRST__ROW__ID: returns the first-row-id of the DataFile.
  This is stored in the metadata, once for each data file, it is not
  present in the data files.

Now we can calculate Iceberg V3 row-id and last-updated-sequence-number
the following way:

* row-id:
  COALESCE(_file_row_id,
           ICEBERG__FIRST__ROW__ID + FILE__POSITION)
* last-updated-sequence-number:
  COALESCE(_file_last_updated_sequence_number,
           ICEBERG__DATA__SEQUENCE__NUMBER)

Later we might add syntactic sugars for the above, for now this patch
set only makes it possible to calculate the values via the above
expressions.

Testing
 * e2e tests added with Iceberg V3 tables written by Spark

Change-Id: I71b1076b25c9e7a0a6c9428b24abc986f5382c71
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAlterColStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableDropColStmt.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
M fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/IcebergMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-0-153001a8-dc43-4e8b-ad61-b691a1754e16-0-00001.parquet
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-1-9e4c5793-eb01-410d-a963-807e22437794-0-00001.parquet
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-1-e55b64a3-1aa3-4a3c-87a1-cd3d2988c499-0-00001.parquet
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/data/00000-2-d67e29ee-b654-4420-a7a5-9d7964ffd9c9-0-00001.parquet
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m1.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m2.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7411e291-ddc0-4c54-9e25-75ef7878df0d-m3.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/7a6ede87-b2d9-462e-9baa-77e456f07671-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/8ea2cf61-8fe7-4599-923a-d64b424cae3f-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/e46e6fcd-0a4e-4001-a0db-e199a5eb4227-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/fe2e965b-4685-4369-babf-31d13f81f10a-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-2872597867664652808-1-fe2e965b-4685-4369-babf-31d13f81f10a.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-5398841822738664432-1-7411e291-ddc0-4c54-9e25-75ef7878df0d.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-7384452996480084466-1-e46e6fcd-0a4e-4001-a0db-e199a5eb4227.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/snap-8059325670730066324-1-8ea2cf61-8fe7-4599-923a-d64b424cae3f.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage/metadata/version-hint.text
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-0-e69cb204-0c90-4255-8b0b-7af3aec3f75d-0-00001.orc
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-1-0ac66c53-638d-4aaf-9084-8a24b7aa2cdf-0-00001.orc
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-2-f69a801d-ce1f-478e-98e4-f5321d122361-0-00001.orc
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/data/00000-3-84703627-8eea-44f1-a09b-e5bdad596090-0-00001.orc
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/3159a0a5-681d-4ac9-bf72-4be5814546cf-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/4e12ed17-3e31-4d27-b35f-55467a2bf5fe-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m1.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m2.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/8542d294-4d10-4efc-9e9d-69d3dce88108-m3.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/e5f99ba8-b804-434f-aa9e-d51e86cc0180-m0.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-1530771818348079345-1-3159a0a5-681d-4ac9-bf72-4be5814546cf.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7033898671372067760-1-4e12ed17-3e31-4d27-b35f-55467a2bf5fe.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7330590250419058232-1-8542d294-4d10-4efc-9e9d-69d3dce88108.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/snap-7480347588879981313-1-e5f99ba8-b804-434f-aa9e-d51e86cc0180.avro
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/iceberg_v3/iceberg_v3_row_lineage_orc/metadata/version-hint.text
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-row-lineage.test
M tests/query_test/test_iceberg.py
74 files changed, 750 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/24042/3
--
To view, visit http://gerrit.cloudera.org:8080/24042
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I71b1076b25c9e7a0a6c9428b24abc986f5382c71
Gerrit-Change-Number: 24042
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>

Reply via email to