Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21557


Change subject: IMPALA-13088: Use RoaringBitmap instead of sorted vector of 
int64s
......................................................................

IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s

This patch substitutes the sorted 64-bit integer vectors that we
use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the
CRoaring library, but this patch also adds a thin C++ wrapper around
the C functions to get the best performance.

Performance
I used an extended version of the "One Trillion Row" challenge. This
means after inserting 1 Trillion records to a table I also deleted /
updated lots of records. So at the end I had 1 Trillion data records
and ~68.5 Billion delete records in the table.

For the measurements I used clusters with 10 and 40 executors, and
executed the following query:

 SELECT station, min(measure), max(measure), avg(measure)
 FROM measurements_extra_1trc_partitioned
 GROUP BY 1
 ORDER BY 1;

JOIN BUILD times:
+----------------+--------------+--------------+
| Implementation | 10 executors | 40 executors |
+----------------+--------------+--------------+
| Sorted vectors | CRASH        | 2m3s         |
| Roaring bitmap | 6m35s        | 1m51s        |
+----------------+--------------+--------------+

10 executors cluster with sorted vectors failed to run the query because
executors crashed due to out-of-memory.

Memory usage (VmRSS) for 10 executors:
+----------------+------------------------+
| Implementation |      10 executors      |
+----------------+------------------------+
| Sorted vectors | 54.4 GB (before CRASH) |
| Roaring bitmap | 7.4 GB                 |
+----------------+------------------------+

Testing:
 * TODO: unit tests for RoaringBitmap64

Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
A be/src/thirdparty/roaring/LICENSE
A be/src/thirdparty/roaring/README.md
A be/src/thirdparty/roaring/roaring.c
A be/src/thirdparty/roaring/roaring.h
M be/src/util/CMakeLists.txt
A be/src/util/roaring-bitmap-test.cc
A be/src/util/roaring-bitmap.h
11 files changed, 29,336 insertions(+), 274 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21557/1
--
To view, visit http://gerrit.cloudera.org:8080/21557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107
Gerrit-Change-Number: 21557
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>

Reply via email to