Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24141
to look at the new patch set (#3).
Change subject: IMPALA-13273: Disallow writing NULL values in non-nullable
columns
......................................................................
IMPALA-13273: Disallow writing NULL values in non-nullable columns
Before this patch we didn't enforce NOT NULL constraints during
writing. This caused issues with Iceberg tables with non-nullable
columns. E.g.:
create table t_ice_constr(c1 int not null) stored as iceberg;
insert into t_ice_constr select null;
select c1 from t_ice_constr;
The above select returned a value instead of NULL, because the slot
descriptor associated with column 'c1' was not nullable, so didn't
even have a null indicator bit.
The fix is to forbid writing NULLs in non-nullable columns in the
first place. This is now enforced in the Parquet writer's
FinalizeCurrentPage() function where we have statistics about the
number of NULLs written.
Schema evolution concerns
* Iceberg allows making a required column optional (via
UpdateSchema.makeColumnOptional()) This is a compatible change,
because if a reader expects optional values then it is not a problem
if the values are always there in the data files.
* Iceberg has UpdateSchema.requireColumn(), but it is only allowed
if users call allowIncompatibleChanges() as well, as it can break
reading older data.
* Iceberg also has UpdateSchema.addRequiredColumn() but users should
also set a default value to not break readers. Iceberg only allows
adding new required columns without default values if they explicitly
call allowIncompatibleChanges().
* Iceberg says users should only call allowIncompatibleChanges() if
they have validated that all of their old data files are compatible.
E.g. if a column was optional (nullable), but all old data files
contain values for that column.
Testing
* e2e tests added
Change-Id: I1189da3094beee615a5d4600576febde4be8473d
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
12 files changed, 70 insertions(+), 46 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/24141/3
--
To view, visit http://gerrit.cloudera.org:8080/24141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1189da3094beee615a5d4600576febde4be8473d
Gerrit-Change-Number: 24141
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>