Hello Joe McDonnell, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22217
to look at the new patch set (#8).
Change subject: WIP IMPALA-13475: Mem limit deferred RPC processing parallelism
......................................................................
WIP IMPALA-13475: Mem limit deferred RPC processing parallelism
Before this patch KrpcDataStreamRecvr::SenderQueue::GetBatch() didn't
consider the mem requirement of the batches when initiating the
deserialization of deferred RCPs (EnqueueDeserializeTask) and tried
to deserialize as much batches in parallel as possible, based
on FLAGS_datastream_service_num_deserialization_threads.
This meant that some of the awakened threads could be rejected due to
hitting mem limit. In this case the deferred RPC was put back in queue.
Besides wasting resources this could also lead to starving bigger RPCs
by rejecting them and letting smaller RPCs take over.
The patch introduces the concept of "pending deferred RPCs", which
are RPCs where processing was requested, but the deseralization
thread didn't pick up the RPCs yet. The deserialized size of pending
deferred RPCs is considered already taken and only N deserializations
are started if the first N RPC's payload would fit to memory (with
the exception of empty queue where at least 1 is started).
The state flow of deferred RPCs became the following where each
transition should be able to succeed:
1. no deserialization thread started (in deferred_rpcs_)
2. deserialization thread started (in pending_deferred_rpcs_)
3. currently deserialized (tracked in num_pending_enqueue_)
4. enqueued
Note that it is still possible for batches to take over in 3
if their deserialization is faster.
Merging exchanges need special handling as those have multiple
queues with shared mem limit before this patch. The patch splits
the mem limit evenly among the queues to allow them enforcing
the "respect mem limit, with the exception when empty" rule
individually, as at least one batch in queue per sender is
critical to allow the merge sort to progress. This may lead to
perf regression in skewed merging exchenges where on sender may
"need" the queue space more than others.
TODOs:
- early senders can't calculate the deserialized size yet and
probably not handled correctly
- find a better name than pending deferred RPCs?
- properly think the merging exchange case through
Change-Id: I4d8e2b07a19e4d30210d7903c01e75fba632a1de
---
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/util/spinlock.h
M tests/custom_cluster/test_exchange_deferred_batches.py
6 files changed, 257 insertions(+), 109 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/22217/8
--
To view, visit http://gerrit.cloudera.org:8080/22217
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d8e2b07a19e4d30210d7903c01e75fba632a1de
Gerrit-Change-Number: 22217
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>