Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21557
Change subject: IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s ...................................................................... IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s This patch substitutes the sorted 64-bit integer vectors that we use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the CRoaring library, but this patch also adds a thin C++ wrapper around the C functions to get the best performance. Performance I used an extended version of the "One Trillion Row" challenge. This means after inserting 1 Trillion records to a table I also deleted / updated lots of records. So at the end I had 1 Trillion data records and ~68.5 Billion delete records in the table. For the measurements I used clusters with 10 and 40 executors, and executed the following query: SELECT station, min(measure), max(measure), avg(measure) FROM measurements_extra_1trc_partitioned GROUP BY 1 ORDER BY 1; JOIN BUILD times: +----------------+--------------+--------------+ | Implementation | 10 executors | 40 executors | +----------------+--------------+--------------+ | Sorted vectors | CRASH | 2m3s | | Roaring bitmap | 6m35s | 1m51s | +----------------+--------------+--------------+ 10 executors cluster with sorted vectors failed to run the query because executors crashed due to out-of-memory. Memory usage (VmRSS) for 10 executors: +----------------+------------------------+ | Implementation | 10 executors | +----------------+------------------------+ | Sorted vectors | 54.4 GB (before CRASH) | | Roaring bitmap | 7.4 GB | +----------------+------------------------+ Testing: * TODO: unit tests for RoaringBitmap64 Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107 --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h A be/src/thirdparty/roaring/LICENSE A be/src/thirdparty/roaring/README.md A be/src/thirdparty/roaring/roaring.c A be/src/thirdparty/roaring/roaring.h M be/src/util/CMakeLists.txt A be/src/util/roaring-bitmap-test.cc A be/src/util/roaring-bitmap.h 11 files changed, 29,336 insertions(+), 274 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21557/1 -- To view, visit http://gerrit.cloudera.org:8080/21557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib769965d094149e99c43e0044914d9ecccc76107 Gerrit-Change-Number: 21557 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
