Hello Daniel Becker, Kurt Deschler, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21557
to look at the new patch set (#2).
Change subject: IMPALA-13088: Use RoaringBitmap instead of sorted vector of
int64s
......................................................................
IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s
This patch substitutes the sorted 64-bit integer vectors that we
use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the
CRoaring library (version 4.0.0). CRoaring also offers C++ classes,
but this patch adds its own thin C++ wrapper class around the C
functions to get the best performance.
Performance
I used an extended version of the "One Trillion Row" challenge. This
means after inserting 1 Trillion records to a table I also deleted /
updated lots of records. So at the end I had 1 Trillion data records
and ~68.5 Billion delete records in the table.
For the measurements I used clusters with 10 and 40 executors, and
executed the following query:
SELECT station, min(measure), max(measure), avg(measure)
FROM measurements_extra_1trc_partitioned
GROUP BY 1
ORDER BY 1;
JOIN BUILD times:
+----------------+--------------+--------------+
| Implementation | 10 executors | 40 executors |
+----------------+--------------+--------------+
| Sorted vectors | CRASH | 2m3s |
| Roaring bitmap | 6m35s | 1m51s |
+----------------+--------------+--------------+
10 executors cluster with sorted vectors failed to run the query because
executors crashed due to out-of-memory.
Memory usage (VmRSS) for 10 executors:
+----------------+------------------------+
| Implementation | 10 executors |
+----------------+------------------------+
| Sorted vectors | 54.4 GB (before CRASH) |
| Roaring bitmap | 7.4 GB |
+----------------+------------------------+
Testing:
* added tests for RoaringBitmap64
Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
A be/src/thirdparty/roaring/LICENSE
A be/src/thirdparty/roaring/README.md
A be/src/thirdparty/roaring/roaring.c
A be/src/thirdparty/roaring/roaring.h
M be/src/util/CMakeLists.txt
A be/src/util/roaring-bitmap-test.cc
A be/src/util/roaring-bitmap.h
M bin/rat_exclude_files.txt
12 files changed, 29,367 insertions(+), 274 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21557/2
--
To view, visit http://gerrit.cloudera.org:8080/21557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107
Gerrit-Change-Number: 21557
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>