Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
......................................................................


Patch Set 5:

(14 comments)

Yet another partial review until be/src/exec/iceberg-delete-sink-base.cc

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h@35
PS4, Line 35: class IcebergBufferedDeleteSink : public IcebergDeleteSinkBase {
There could be a comment describing when this class is/should be used, e.g. 
what the difference between this and IcebergDeleteSink is.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc
File be/src/exec/iceberg-buffered-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc@345
PS4, Line 345: void IcebergBufferedDeleteSink::Close(RuntimeState* state) {
See comment at iceberg-delete-sink-base.cc:66.
Possibly we could convert it into DCHECK(closed_).


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30
PS4, Line 30: IcebergDeleteSinkBase
Could we make this class pure virtual?


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@63
PS4, Line 63:   std::string HumanReadablePartitionValue(
An example would be useful.

Could you explain what 'transform_result' is? How should it be interpreted and 
how should the return value be interpreted if 'transform_status' is an error?


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@70
PS4, Line 70:   DmlExecState dml_exec_state_;
Does the unbuffered IcebergDeleteSink also need a separate DmlExecState? Is 
that used for updates too? Or are updates always handled by the buffered 
IcebergBufferedDeleteSink?


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66
PS4, Line 66:   if (closed_) return;
The 'closed_' field is a bit strange, AFAICS it comes from DataSink and the 
comment there writes that "subclasses should check and set this in Close()." 
But in DataSink::Close() 'closed_' is set to true.

Then some subclasses like TableSinkBase and this one don't set it while others 
like IcebergBufferedDeleteSink do.

I think either the comment at 'DataSink::closed_' should be updated (if 
DataSink is responsible for setting 'closed_') OR we should have all concrete 
subclasses set it (and make intermediate classes that are never instantiated 
into pure virtual classes).


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@88
PS4, Line 88:     OutputPartition* output_partition) {
Nit: doesn't it fit on the previous line?


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@100
PS4, Line 100: s
Why is it plural? Maybe it could be 'partition_values_eval', like the argument 
on L111.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@104
PS4, Line 104: s
See L100.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@112
PS4, Line 112:   if (partition_key_expr_evals_.empty()) {
Optional: if this function is called from the other ConstructPartitionInfo() 
overload, this check has already been done. We could extract the rest of this 
function into a helper function that is called by both "top level" functions.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@113
PS4, Line 113:     output_partition->iceberg_spec_id = 
table_desc_->IcebergSpecId();
Is there a valid use case for calling this function (and providing a 'spec_id') 
when 'partition_key_expr_evals_' is empty? Can 'spec_id' be different from 
'table_desc_->IcebergSpecId()' If not, we should add a DCHECK; it they can be 
different, it's a bit unintuitive.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@159
PS4, Line 159:   stringstream external_partition_name_ss;
Is this ever written?


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@164
PS4, Line 164: encoded
It could be 'url_encoded_...' - "encoded" is ambiguous, it could also refer to 
Base64 encoding in this context.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@169
PS4, Line 169: string&
Is it intentional that we overwrite the elements of 'partition_values_decoded' 
on L171? I think it's a bit less clean that having an independent variable 
here, though it may be faster.



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 11 Dec 2023 16:04:48 +0000
Gerrit-HasComments: Yes

Reply via email to