Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/24141


Change subject: IMPALA-13273: Disallow writing NULL values in non-nullable 
columns
......................................................................

IMPALA-13273: Disallow writing NULL values in non-nullable columns

Before this patch we didn't enforce NOT NULL constraints during
writing. This caused issues with Iceberg tables with non-nullable
columns. E.g.:

create table t_ice_constr(c1 int not null) stored as iceberg;
insert into t_ice_constr select null;
select c1 from t_ice_constr;

The above select returned a value instead of NULL, because the slot
descriptor associated with column 'c1' was not nullable, so didn't
even have a null indicator bit.

The fix is to forbid writing NULLs in non-nullable columns in the
first place. This is now enforced in the Parquet writer's
FinalizeCurrentPage() function where we have statistics about the
number of NULLs written.

Schema evolution concerns
* Iceberg allows making a required column optional (via
  UpdateSchema.makeColumnOptional()) This is a compatible change,
  because if a reader expects optional values then it is not a problem
  if the values are always there in the data files.
* Iceberg has UpdateSchema.requireColumn(), but it is only allowed
  if users call allowIncompatibleChanges() as well, as it can break
  reading older data.
* Iceberg also has UpdateSchema.addRequiredColumn() but users should
  also set a default value to not break readers. Iceberg only allows
  adding new required columns without default values if they explicitly
  call allowIncompatibleChanges().
* Iceberg says users should only call allowIncompatibleChanges() if
  they have validated that all of their old data files are compatible.
  E.g. if a column was optional (nullable), but all old data files
  contain values for that column.

Testing
 * e2e tests added

Change-Id: I1189da3094beee615a5d4600576febde4be8473d
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
12 files changed, 69 insertions(+), 47 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/24141/1
--
To view, visit http://gerrit.cloudera.org:8080/24141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1189da3094beee615a5d4600576febde4be8473d
Gerrit-Change-Number: 24141
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>

Reply via email to