Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 )
Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. ...................................................................... Patch Set 5: (14 comments) Yet another partial review until be/src/exec/iceberg-delete-sink-base.cc http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h File be/src/exec/iceberg-buffered-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h@35 PS4, Line 35: class IcebergBufferedDeleteSink : public IcebergDeleteSinkBase { There could be a comment describing when this class is/should be used, e.g. what the difference between this and IcebergDeleteSink is. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc File be/src/exec/iceberg-buffered-delete-sink.cc: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc@345 PS4, Line 345: void IcebergBufferedDeleteSink::Close(RuntimeState* state) { See comment at iceberg-delete-sink-base.cc:66. Possibly we could convert it into DCHECK(closed_). http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h File be/src/exec/iceberg-delete-sink-base.h: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30 PS4, Line 30: IcebergDeleteSinkBase Could we make this class pure virtual? http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@63 PS4, Line 63: std::string HumanReadablePartitionValue( An example would be useful. Could you explain what 'transform_result' is? How should it be interpreted and how should the return value be interpreted if 'transform_status' is an error? http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@70 PS4, Line 70: DmlExecState dml_exec_state_; Does the unbuffered IcebergDeleteSink also need a separate DmlExecState? Is that used for updates too? Or are updates always handled by the buffered IcebergBufferedDeleteSink? http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66 PS4, Line 66: if (closed_) return; The 'closed_' field is a bit strange, AFAICS it comes from DataSink and the comment there writes that "subclasses should check and set this in Close()." But in DataSink::Close() 'closed_' is set to true. Then some subclasses like TableSinkBase and this one don't set it while others like IcebergBufferedDeleteSink do. I think either the comment at 'DataSink::closed_' should be updated (if DataSink is responsible for setting 'closed_') OR we should have all concrete subclasses set it (and make intermediate classes that are never instantiated into pure virtual classes). http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@88 PS4, Line 88: OutputPartition* output_partition) { Nit: doesn't it fit on the previous line? http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@100 PS4, Line 100: s Why is it plural? Maybe it could be 'partition_values_eval', like the argument on L111. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@104 PS4, Line 104: s See L100. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@112 PS4, Line 112: if (partition_key_expr_evals_.empty()) { Optional: if this function is called from the other ConstructPartitionInfo() overload, this check has already been done. We could extract the rest of this function into a helper function that is called by both "top level" functions. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@113 PS4, Line 113: output_partition->iceberg_spec_id = table_desc_->IcebergSpecId(); Is there a valid use case for calling this function (and providing a 'spec_id') when 'partition_key_expr_evals_' is empty? Can 'spec_id' be different from 'table_desc_->IcebergSpecId()' If not, we should add a DCHECK; it they can be different, it's a bit unintuitive. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@159 PS4, Line 159: stringstream external_partition_name_ss; Is this ever written? http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@164 PS4, Line 164: encoded It could be 'url_encoded_...' - "encoded" is ambiguous, it could also refer to Base64 encoding in this context. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@169 PS4, Line 169: string& Is it intentional that we overwrite the elements of 'partition_values_decoded' on L171? I think it's a bit less clean that having an independent variable here, though it may be faster. -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 11 Dec 2023 16:04:48 +0000 Gerrit-HasComments: Yes
