Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21718 )
Change subject: IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode ...................................................................... Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc File be/src/exec/iceberg-delete-node.cc: http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@309 PS5, Line 309: auto NextProbePos = [this, &probe_it]() { > Yes, it does. The iterator shouldn't go beyond the end of the row batch bec I think that this is a bit confusing and doesn't remove much duplication. What about moving the DCHECK to ProbeFilePosition() and call the rest of the lines directly? As a side note I think that the code could be a bit faster by assuming that the probe side has only 1 tuple per row (is this right?) and using a simple Tuple** instead of RowBatch::Iterator. http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@319 PS5, Line 319: ++rows_to_copy; When most rows are not deleted it could be faster to use binary search in the probe batch to find a probe row where current_probe_pos >= next_deleted_pos. http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/exec/iceberg-delete-node.cc@341 PS5, Line 341: : impala::StringValue* file_path = : probe_batch_iterator.Get()->GetTuple(0)->GetStringSlot(file_path_offset_); It's been a long time since I looked at this area of the code - can we assume that all rows in a batch have the same file path on a the probe side? A comment and maybe some DCHECKs to verify this would be nice. http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc File be/src/util/roaring-bitmap-test.cc: http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc@88 PS5, Line 88: If I didn't miss something then currently we don't test the case when MoveAndGetEqualOrLarger() hits the end. Can you add a test test where there is a big gap between hitting the end, e.g. 3000001 directly after 14? http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap-test.cc@94 PS5, Line 94: DenseMapIteration Can you add a similar test where the second loop calls GetEqualOrLarger less frequently than MOVE_THRESHOLD? http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h File be/src/util/roaring-bitmap.h: http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h@71 PS5, Line 71: HasValue This doesn't seem to be used at other places, could be private. http://gerrit.cloudera.org:8080/#/c/21718/5/be/src/util/roaring-bitmap.h@87 PS5, Line 87: // Incoming 'x' values must be in ascending order. Can you add a DCHECK for this? My understanding is that x > Value() must be always true until we hit the end of the bitmap. -- To view, visit http://gerrit.cloudera.org:8080/21718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1 Gerrit-Change-Number: 21718 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 14 Oct 2024 13:22:20 +0000 Gerrit-HasComments: Yes
