Hello Daniel Becker, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21718
to look at the new patch set (#6).
Change subject: IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode
......................................................................
IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode
Typically there are much more data records than delete records in a
healthy Iceberg table. This means it is suboptimal to copy probe rows
one by one in the IcebergDeleteNode. With this patch we switch to
RowBatch::CopyRows method to copy tuple rows in batches.
We also switch to an iterator based approach when we test the
deleted rows which seem to be more efficient than ContainsBulk().
I measured the Avg Time of DELETE EVENTS ICEBERG DELETE operator.
Local Measurements
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 2 Billion | 1 Billion | 15.82s | 14.73s |
| 1.2 Billion | 70 Million | 5.64s | 2.4s |
+--------------+----------------+--------------------+--------------------+
Large scale measurements
1 Coordinator, 10 executors.
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 405 Billion | 68.5 Billion | 87.30s | 54.76s |
| 301 Billion | 18 Billion | 67.38s | 25.31s |
+--------------+----------------+--------------------+--------------------+
1 Coordinator, 40 executors.
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 405 Billion | 68.5 Billion | 23.18s | 14.72s |
| 301 Billion | 18 Billion | 16.52s | 6.09s |
+--------------+----------------+--------------------+--------------------+
Testing
* added unit tests for the new methods of RoaringBitmap
Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1
---
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/row-batch.h
M be/src/util/roaring-bitmap-test.cc
M be/src/util/roaring-bitmap.h
5 files changed, 303 insertions(+), 74 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/21718/6
--
To view, visit http://gerrit.cloudera.org:8080/21718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1
Gerrit-Change-Number: 21718
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>