Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/21718

to look at the new patch set (#3).

Change subject: IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode
......................................................................

IMPALA-13325: Use RowBatch::CopyRows in IcebergDeleteNode

Typically there are much more data records than delete records in a
healthy Iceberg table. This means it is suboptimal to copy probe rows
one by one in the IcebergDeleteNode. With this patch we switch to
RowBatch::CopyRows method to copy tuple rows in batches.

We also switch to an iterator based approach when we test the
deleted rows which seem to be more efficient than ContainsBulk().

I measured the Avg Time of DELETE EVENTS ICEBERG DELETE operator.

Local Measurements
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 2 Billion    | 1 Billion      | 15.82s             | 14.73s             |
| 1.2 Billion  | 70 Million     | 5.64s              | 2.4s               |
+--------------+----------------+--------------------+--------------------+

Large scale measurements
1 Coordinator, 10 executors.
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 405 Billion  | 68.5 Billion   | 87.30s             | 54.76s             |
| 301 Billion  | 18 Billion     | 67.38s             | 25.31s             |
+--------------+----------------+--------------------+--------------------+

1 Coordinator, 10 executors.
+--------------+----------------+--------------------+--------------------+
| Data records | Delete records | Old implementation | New implementation |
+--------------+----------------+--------------------+--------------------+
| 405 Billion  | 68.5 Billion   | 23.18s             | 14.72s             |
| 301 Billion  | 18 Billion     | 16.52s             | 6.09s              |
+--------------+----------------+--------------------+--------------------+

Testing
 * added unit tests for the new methods of RoaringBitmap

Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1
---
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/row-batch.h
M be/src/util/roaring-bitmap-test.cc
M be/src/util/roaring-bitmap.h
5 files changed, 245 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/21718/3
--
To view, visit http://gerrit.cloudera.org:8080/21718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46487fefa300027e9df6cd7fb36c78af01dd56c1
Gerrit-Change-Number: 21718
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to