Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21718 )

Change subject: IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@309
PS5, Line 309:   auto NextProbePos = [this, &probe_it]() {
> Yes, it does. The iterator shouldn't go beyond the end of the row batch bec
I think that this is a bit confusing and doesn't remove much duplication. What 
about moving the DCHECK to ProbeFilePosition() and call the rest of the lines 
directly?

As a side note I think that the code could be a bit faster by assuming that the 
probe side has only 1 tuple per row (is this right?) and using a simple Tuple** 
instead of RowBatch::Iterator.


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@319
PS5, Line 319:     ++rows_to_copy;
When most rows are not deleted it could be faster to use binary search in the 
probe batch to find a probe row where current_probe_pos >= next_deleted_pos.


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@341
PS5, Line 341:
             :   impala::StringValue* file_path =
             :       
probe_batch_iterator.Get()->GetTuple(0)->GetStringSlot(file_path_offset_);
It's been a long time since I looked at this area of the code - can we assume 
that all rows in a batch have the same file path on a the probe side? A comment 
and maybe some DCHECKs to verify this would be nice.


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc
File be/src/util/roaring-bitmap-test.cc:

http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc@88
PS5, Line 88:
If I didn't miss something then currently we don't test the case when 
MoveAndGetEqualOrLarger() hits the end. Can you add a test test where there is 
a big gap between hitting the end, e.g. 3000001 directly after 14?


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc@94
PS5, Line 94: DenseMapIteration
Can you add a similar test where the second loop calls GetEqualOrLarger less 
frequently than MOVE_THRESHOLD?


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h
File be/src/util/roaring-bitmap.h:

http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h@71
PS5, Line 71: HasValue
This doesn't seem to be used at other places, could be private.


http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h@87
PS5, Line 87:       // Incoming 'x' values must be in ascending order.
Can you add a DCHECK for this? My understanding is that x > Value() must be 
always true until we hit the end of the bitmap.



--
To view, visit http://gerrit.cloudera.org:8080/21718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1
Gerrit-Change-Number: 21718
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 14 Oct 2024 13:22:20 +0000
Gerrit-HasComments: Yes

Reply via email to