Arnab Karmakar has posted comments on this change. ( http://gerrit.cloudera.org:8080/24088 )
Change subject: IMPALA-14589: [Part 1] Add support for Iceberg V3 default values ...................................................................... Patch Set 5: (14 comments) http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG@17 PS3, Line 17: 2. **write-default**: Applied when WRITING new rows that don't specify a : value for a column. This ensures consistent defaults across all : engines writing to the table, regardless of whether they're inserting : via Spark, Flink, or Impala. > To make the patch smaller, this could be implemented in a different CR. Removed the write-default changes from InsertStmt.java and the test cases. We are still setting the write-default value in the frontend though. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h@288 PS3, Line 288: /// Returns index into materialized_slots with 'path'. Returns SKIP_COLUMN if : /// that path is not materialized. Only valid to call after Init(). : int GetMaterializedSlotIdx(const std::vector<int>& path) const; : > It could be moved to file-metadata-utils where we handle Iceberg specific f Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@429 PS3, Line 429: auto result = path_to_materialized_slot_idx_.find(path) > Do we have tests for this? I added new tests, using a partitioned table with initial default for the partition column. The partition key is overriding the defaults. http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@452 PS3, Line 452: e.use_mt_scan_node ? > Iceberg partitions are handled differently, i.e. this would be always empty Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc File be/src/exec/orc/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@645 PS3, Line 645: LL if the field is missing and we need data from the file. : (*template_tuple)->SetNull(slot_desc->null_indicator_offset()); : } : missing_field_slots_.insert(slot_desc); > I think this should be done by FileMetadataUtils::CreateTemplateTuple() PopulateIcebergDefaults() only sets NotNull. We need to modify this condition as this just sets the slots to NULL unconditionally. Now we'll be checking: ``` else if (file_metadata_utils_.NeedDataInFile(slot_desc)) { // Set NULL if the field is missing and we need data from the file. (*template_tuple)->SetNull(slot_desc->null_indicator_offset()); } ``` http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@685 PS3, Line 685: se ACID_FIELD_CURRENT_TRANSACTION_INDEX: > Why does this patch need to touch Hive ACID fields? Removed this as its being done by PopulateIcebergDefaults(). http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@649 PS1, Line 649: // We are selecting a column that is not in the file. We would set its slot to NULL : // during the scan, so any predicate would evaluate to false. Return early. NULL : // comparisons cannot happen here, since predicates with NULL literals are filtered : // in the frontend. : *skip_row_group = true; : break; : } : : ColumnStatsReader stats_reader = : > I think we should introduce a new method in FileMetadataUtils(). Added deta Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@653 PS3, Line 653: p_row_group = true; > This check could be moved to FileMetadataUtils::NeedDataInFile() Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@799 PS3, Line 799: > Same as above. Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2928 PS3, Line 2928: ; > Or: Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2953 PS3, Line 2953: > Could be NeedDataInFile() once we modified it. Done http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc@183 PS3, Line 183: } : } : > I think we don't need this in the backend. The Frontend sets the default va Done http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift File common/thrift/Descriptors.thrift: http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift@79 PS3, Line 79: } > I think we don't need it in the backends. Done http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@253 PS3, Line 253: private PlanNode createIcebergScanPlanImpl() throws ImpalaException { > We don't handle default values of nested fields. We could add a check for t Done -- To view, visit http://gerrit.cloudera.org:8080/24088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9 Gerrit-Change-Number: 24088 Gerrit-PatchSet: 5 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 13 Apr 2026 17:57:26 +0000 Gerrit-HasComments: Yes
