Arnab Karmakar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24088 )

Change subject: IMPALA-14589: [Part 1] Add support for Iceberg V3 default values
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24088/3//COMMIT_MSG@17
PS3, Line 17: 2. **write-default**: Applied when WRITING new rows that don't 
specify a
            :    value for a column. This ensures consistent defaults across all
            :    engines writing to the table, regardless of whether they're 
inserting
            :    via Spark, Flink, or Impala.
> To make the patch smaller, this could be implemented in a different CR.
Removed the write-default changes from InsertStmt.java and the test cases. We 
are still setting the write-default value in the frontend though.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.h@288
PS3, Line 288:   /// Returns index into materialized_slots with 'path'.  
Returns SKIP_COLUMN if
             :   /// that path is not materialized. Only valid to call after 
Init().
             :   int GetMaterializedSlotIdx(const std::vector<int>& path) const;
             :
> It could be moved to file-metadata-utils where we handle Iceberg specific f
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@429
PS3, Line 429: auto result = path_to_materialized_slot_idx_.find(path)
> Do we have tests for this?
I added new tests, using a partitioned table with initial default for the 
partition column. The partition key is overriding the defaults.


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/hdfs-scan-node-base.cc@452
PS3, Line 452: e.use_mt_scan_node ?
> Iceberg partitions are handled differently, i.e. this would be always empty
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@645
PS3, Line 645: LL if the field is missing and we need data from the file.
             :         
(*template_tuple)->SetNull(slot_desc->null_indicator_offset());
             :       }
             :       missing_field_slots_.insert(slot_desc);
> I think this should be done by FileMetadataUtils::CreateTemplateTuple()
PopulateIcebergDefaults() only sets NotNull. We need to modify this condition 
as this just sets the slots to NULL unconditionally. Now we'll be checking:
```
else if (file_metadata_utils_.NeedDataInFile(slot_desc)) {
        // Set NULL if the field is missing and we need data from the file.
        (*template_tuple)->SetNull(slot_desc->null_indicator_offset());
      }
```


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/orc/hdfs-orc-scanner.cc@685
PS3, Line 685: se ACID_FIELD_CURRENT_TRANSACTION_INDEX:
> Why does this patch need to touch Hive ACID fields?
Removed this as its being done by PopulateIcebergDefaults().


http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@649
PS1, Line 649:       // We are selecting a column that is not in the file. We 
would set its slot to NULL
             :       // during the scan, so any predicate would evaluate to 
false. Return early. NULL
             :       // comparisons cannot happen here, since predicates with 
NULL literals are filtered
             :       // in the frontend.
             :       *skip_row_group = true;
             :       break;
             :     }
             :
             :     ColumnStatsReader stats_reader =
             :
> I think we should introduce a new method in FileMetadataUtils(). Added deta
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@653
PS3, Line 653: p_row_group = true;
> This check could be moved to FileMetadataUtils::NeedDataInFile()
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@799
PS3, Line 799:
> Same as above.
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2928
PS3, Line 2928: ;
> Or:
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2953
PS3, Line 2953:
> Could be NeedDataInFile() once we modified it.
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/24088/3/be/src/runtime/descriptors.cc@183
PS3, Line 183:   }
             : }
             :
> I think we don't need this in the backend. The Frontend sets the default va
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift
File common/thrift/Descriptors.thrift:

http://gerrit.cloudera.org:8080/#/c/24088/3/common/thrift/Descriptors.thrift@79
PS3, Line 79: }
> I think we don't need it in the backends.
Done


http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/24088/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@253
PS3, Line 253:   private PlanNode createIcebergScanPlanImpl() throws 
ImpalaException {
> We don't handle default values of nested fields. We could add a check for t
Done



--
To view, visit http://gerrit.cloudera.org:8080/24088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9
Gerrit-Change-Number: 24088
Gerrit-PatchSet: 5
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 13 Apr 2026 17:57:26 +0000
Gerrit-HasComments: Yes

Reply via email to