Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21435


Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
......................................................................

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 101 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to