Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/24088 )
Change subject: IMPALA-14589: Add support for Iceberg V3 default values ...................................................................... Patch Set 3: (14 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG@17 PS3, Line 17: 2. **write-default**: Applied when WRITING new rows that don't specify a : value for a column. This ensures consistent defaults across all : engines writing to the table, regardless of whether they're inserting : via Spark, Flink, or Impala. To make the patch smaller, this could be implemented in a different CR. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h@288 PS3, Line 288: /// Populates Iceberg default values into the given template tuple. : /// This is called from InitTemplateTuple() for Iceberg tables to apply initial-default : /// values to all top-level columns (complex type defaults are not supported). : void PopulateIcebergDefaults(Tuple* template_tuple, MemPool* pool) const; It could be moved to file-metadata-utils where we handle Iceberg specific features. It could be called by FileMetadataUtils::CreateTemplateTuple() http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@429 PS3, Line 429: // Partition key values override defaults if both exist Do we have tests for this? http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@452 PS3, Line 452: partition_key_slots_ Iceberg partitions are handled differently, i.e. this would be always empty for Iceberg tables. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc File be/src/exec/orc/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@645 PS3, Line 645: !file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) { : // Only set NULL if the slot doesn't have a default value (which would have been : // pre-populated in InitTemplateTuple during init phase) : (*template_tuple)->SetNull(slot_desc->null_indicator_offset()); I think this should be done by FileMetadataUtils::CreateTemplateTuple() http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@685 PS3, Line 685: template_tuple->SetNotNull(slot_desc->null_indicator_offset()) Why does this patch need to touch Hive ACID fields? http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@649 PS1, Line 649: // We are selecting a column that is not in the file. If it doesn't have a default : // value, we would set its slot to NULL during the scan, so any predicate would : // evaluate to false. Return early. NULL comparisons cannot happen here, since : // predicates with NULL literals are filtered in the frontend. : if (!file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) { : *skip_row_group = true; : break; : } : // If the column has a default value, we can't evaluate statistics for it since : / > I tried doing this but faced errors because: I think we should introduce a new method in FileMetadataUtils(). Added details on the new PS. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@653 PS3, Line 653: !file_metadata_utils_.HasIcebergDefaultValue(slot_desc) This check could be moved to FileMetadataUtils::NeedDataInFile() http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@799 PS3, Line 799: if (!file_metadata_utils_.HasIcebergDefaultValue(slot_desc)) { Same as above. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2928 PS3, Line 2928: NeedDataInFile We could introduce a new method like "FileMetadataUtils::ShouldSkipReadFromFile()". That would return true for partition columns, but false for default values. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2953 PS3, Line 2953: HasIcebergDefaultValue Could be NeedDataInFile() once we modified it. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc@183 PS3, Line 183: if (tdesc.__isset.icebergWriteDefault) { : write_default_ = tdesc.icebergWriteDefault; : } I think we don't need this in the backend. The Frontend sets the default value. http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift File common/thrift/Descriptors.thrift: http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift@79 PS3, Line 79: 7: optional string icebergWriteDefault I think we don't need it in the backends. http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@253 PS3, Line 253: private PlanNode createIcebergScanPlanImpl() throws ImpalaException { We don't handle default values of nested fields. We could add a check for that. -- To view, visit http://gerrit.cloudera.org:8080/24088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9 Gerrit-Change-Number: 24088 Gerrit-PatchSet: 3 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 09 Apr 2026 09:34:01 +0000 Gerrit-HasComments: Yes
