Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21435
Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder ...................................................................... IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 101 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>