Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
......................................................................


Patch Set 3:

(19 comments)

Thanks Zoltán, great patch! So far I've only been able to review it up until 
iceberg-buffered-delete-sink.cc.

http://gerrit.cloudera.org:8080/#/c/20760/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20760/2//COMMIT_MSG@26
PS2, Line 26: FlushFinal
Can it spill to disk if the data it has received can't be stored in memory?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@48
PS3, Line 48: to the temporary Hdfs files
It can be misunderstood, one may thing that we write _into_ Hdfs files, like 
spilling to disk.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@69
PS3, Line 69:   class FilePositionsIterator {
Can this be put into the .cc file?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@74
PS3, Line 74:       pos_level_it_ = file_level_it_->second.begin();
Shouldn't we check whether file_pos is empty?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@78
PS3, Line 78:       StringValue filepath = file_level_it_->first;
We should check for emptiness.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@83
PS3, Line 83:     bool HasNext() { return file_level_it_ != file_level_end_; }
Does HasNext() mean we can call Get() or that there is a valid item after the 
current one?

In the first case is misunderstandable.

If it is we have a valid item after the last one I think there's an off-by-one 
error: when we're currently at the last file position of the last file, both 
'file_level_it_' and 'pos_level_it_' point to valid fields (HasNext() returns 
true) but there is no valid next element.

See also L98. The semantics of Get(), Next() and HasNext() could be clarified. 
We could also eliminate Get() and have Next() return the next value.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@96
PS3, Line 96:       DCHECK(file_level_it_ != file_level_end_);
A comment on Next() could indicate that Next() shouldn't be called when 
iteration is over.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@98
PS3, Line 98: file_level_it_
After incrementing 'file_level_it_' on the previous line, it may now point 
past-the-end, so we can't dereference it. See also L83.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@121
PS3, Line 121:   /// Verifies that there are no duplicates in the buffered 
delete records.
We could include in the comment that it should be called after 
SortBufferedRecords().


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@151
PS3, Line 151: server
Nit: serve?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc
File be/src/exec/iceberg-buffered-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@124
PS3, Line 124: filepath_sv
We could DCHECK that filepath_sv is not NULL.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@149
PS3, Line 149: auto
Optional: to ease understanding, we could use actual types instead of auto in 
these loops.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@161
PS3, Line 161: auto
See L149.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@184
PS3, Line 184: auto
See L149.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@196
PS3, Line 196: file_and_pos.first
We could use 'file' for 'file_and_pos.first' here and with the Len() too.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@218
PS3, Line 218: s
Isn't it only one partition? Maybe 'partition_encoded' would be more 
understandable?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@273
PS3, Line 273:   // we might need to delete rows from partition 
"col_trunc=1000" with both spec ids.
How is this comment connected to the next line? Could you explain?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@288
PS3, Line 288: buffer
Shouldn't it be '*buffer'?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@320
PS3, Line 320:   int64_t* pos_slot = 
row->GetTuple(position_ref->GetTupleIdx())->
Just curious, can filepath_ref->GetTupleIdx() be different from 
position_ref->GetTupleIdx()? If not we could add a DCHECK.



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Fri, 08 Dec 2023 17:07:52 +0000
Gerrit-HasComments: Yes

Reply via email to