Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24088 )

Change subject: IMPALA-14589: Add support for Iceberg V3 default values
......................................................................


Patch Set 3:

(14 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG@17
PS3, Line 17: 2. **write-default**: Applied when WRITING new rows that don't 
specify a
            :    value for a column. This ensures consistent defaults across all
            :    engines writing to the table, regardless of whether they're 
inserting
            :    via Spark, Flink, or Impala.
To make the patch smaller, this could be implemented in a different CR.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h@288
PS3, Line 288:   /// Populates Iceberg default values into the given template 
tuple.
             :   /// This is called from InitTemplateTuple() for Iceberg tables 
to apply initial-default
             :   /// values to all top-level columns (complex type defaults are 
not supported).
             :   void PopulateIcebergDefaults(Tuple* template_tuple, MemPool* 
pool) const;
It could be moved to file-metadata-utils where we handle Iceberg specific 
features. It could be called by FileMetadataUtils::CreateTemplateTuple()


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@429
PS3, Line 429: // Partition key values override defaults if both exist
Do we have tests for this?


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@452
PS3, Line 452: partition_key_slots_
Iceberg partitions are handled differently, i.e. this would be always empty for 
Iceberg tables.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@645
PS3, Line 645: !file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) {
             :         // Only set NULL if the slot doesn't have a default 
value (which would have been
             :         // pre-populated in InitTemplateTuple during init phase)
             :         
(*template_tuple)->SetNull(slot_desc->null_indicator_offset());
I think this should be done by FileMetadataUtils::CreateTemplateTuple()


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@685
PS3, Line 685: template_tuple->SetNotNull(slot_desc->null_indicator_offset())
Why does this patch need to touch Hive ACID fields?


http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@649
PS1, Line 649:       // We are selecting a column that is not in the file. If 
it doesn't have a default
             :       // value, we would set its slot to NULL during the scan, 
so any predicate would
             :       // evaluate to false. Return early. NULL comparisons 
cannot happen here, since
             :       // predicates with NULL literals are filtered in the 
frontend.
             :       if 
(!file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) {
             :         *skip_row_group = true;
             :         break;
             :       }
             :       // If the column has a default value, we can't evaluate 
statistics for it since
             :       /
> I tried doing this but faced errors because:
I think we should introduce a new method in FileMetadataUtils(). Added details 
on the new PS.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@653
PS3, Line 653: !file_metadata_utils_.HasIcebergDefaultValue(slot_desc)
This check could be moved to FileMetadataUtils::NeedDataInFile()


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@799
PS3, Line 799:       if 
(!file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) {
Same as above.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2928
PS3, Line 2928: NeedDataInFile
We could introduce a new method like 
"FileMetadataUtils::ShouldSkipReadFromFile()". That would return true for 
partition columns, but false for default values.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2953
PS3, Line 2953: HasIcebergDefaultValue
Could be NeedDataInFile() once we modified it.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc@183
PS3, Line 183:     if (tdesc.__isset.icebergWriteDefault) {
             :       write_default_ = tdesc.icebergWriteDefault;
             :     }
I think we don't need this in the backend. The Frontend sets the default value.


http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift
File common/thrift/Descriptors.thrift:

http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift@79
PS3, Line 79:   7: optional string icebergWriteDefault
I think we don't need it in the backends.


http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@253
PS3, Line 253:   private PlanNode createIcebergScanPlanImpl() throws 
ImpalaException {
We don't handle default values of nested fields. We could add a check for that.



--
To view, visit http://gerrit.cloudera.org:8080/24088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9
Gerrit-Change-Number: 24088
Gerrit-PatchSet: 3
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 09 Apr 2026 09:34:01 +0000
Gerrit-HasComments: Yes

Reply via email to