Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16545


Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg 
tables (Parquet)
......................................................................

IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

This commit adds support for INSERT INTO statements against Iceberg
tables when the table is non-partitioned and the underlying file format
is Parquet.

We still use Impala's HdfsParquetTableWriter to write the data files,
though they needed some modifications to conform to the Iceberg spec,
namely:
 * write Iceberg/Parquet 'field_id' for the columns
 * TIMESTAMPs are encoded as INT64 micros (without time zone)

We use DmlExecState to transfer information from the table sink
operators to the coordinator, then the coordinator invokes the
AppendFiles API through JNI. DmlExecState is encoded in protobuf,
communication with the Frontend uses Thrift. Therefore to avoid
defining Iceberg DataFile multiple times they are stored in FlatBuffers.

The commit also does some corrections on Impala type <-> Iceberg type
mapping:
 * Impala TIMESTAMP is Iceberg TIMESTAMP (without time zone)
 * Impala CHAR is Iceberg FIXED

Testing:
 * Added INSERT tests to iceberg-insert.test
 * Added negative tests to iceberg-negative.test
 * I also did some manual testing with Spark. Spark is able to read
   Iceberg tables written by Impala until we use TIMESTAMPs. In that
   case Spark rejects the data files because it only accepts TIMESTAMPS
   with time zone.

Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M be/src/runtime/coordinator.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/fbs/CMakeLists.txt
A common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/query_test/test_iceberg.py
36 files changed, 712 insertions(+), 150 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/16545/1
--
To view, visit http://gerrit.cloudera.org:8080/16545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Gerrit-Change-Number: 16545
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>

Reply via email to