Alexey Serbin has uploaded this change for review. (
http://gerrit.cloudera.org:8080/24176
Change subject: WIP [rpc] in-bulk memory recycling for
Connection::ProcessOutboundTransfers()
......................................................................
WIP [rpc] in-bulk memory recycling for Connection::ProcessOutboundTransfers()
WIP:
* collect the initial feedback
* add a test/perf scenario if not covered by rpc-bench and similar?
* provide perf report from rpc-bench (or from the newly added test)
before and after the update
While troubleshooting RPC performance issues in a highly concurrent
workload, I noticed a pattern of lock contention in tcmalloc. For more
context, the majority of RPCs had large side-cars and a single RPC
connection often had a multitude of outgoing in-flight transfers.
Among the captured stack traces, multiple reactor threads in a single
pstack snapshot often had stack traces similar to the one below.
It seems the problem stems from the fact that allocation/deallocation
of large (size > 256KByte) memory chunks in tcmalloc often goes through
the central free list, while the latter one is guarded by a lock.
This patch is an attempt to reduce the lock contention by performing
the socket I/O for all the outgoing transfers with pending data first,
and performing memory deallocation after that in bulk. In addition,
it straightens the memory ownership rules for the
OutboundTransfer::callbacks_ field and modernized signature of the
related methods to return std::unique_ptr instead of raw pointers.
#0 sys_futex (... <tcmalloc::Static::pageheap_lock_>)
#1 base::internal::SpinLockDelay (...)
...
#7 0x00000000038c7ee1 in tcmalloc::ThreadCache::ReleaseToCentralCache(...) ()
#8 0x00000000038c8505 in tcmalloc::ThreadCache::ListTooLong(...) ()
#9 0x00000000038d9103 in google::protobuf::internal::ArenaImpl::~ArenaImpl()
()
#10 0x0000000002043eec in google::protobuf::Arena::~Arena ()
#11 kudu::rpc::InboundCall::~InboundCall (...)
...
#16 kudu::rpc::ResponseTransferCallbacks::NotifyTransferFinished (...)
#17 0x00000000020748be in kudu::rpc::OutboundTransfer::SendBuffer (...)
#18 0x0000000002077de9 in kudu::rpc::Connection::ProcessOutboundTransfers
(...)
#19 0x0000000002078440 in kudu::rpc::Connection::QueueOutbound (...)
#20 0x000000000207ade1 in kudu::rpc::QueueTransferTask::Run (...)
#21 0x0000000002052b40 in kudu::rpc::ReactorThread::AsyncHandler (...)
#22 0x000000000376e063 in ev_invoke_pending ()
#23 0x000000000204f5a1 in kudu::rpc::ReactorThread::InvokePendingCb (...)
#24 0x00000000037713b5 in ev_run ()
#25 0x00000000020508eb in ev::loop_ref::run (...)
#26 kudu::rpc::ReactorThread::RunThread (...)
Change-Id: Idf7ab105a851ef4d583efc2d1b33d57607810df0
---
M src/kudu/rpc/connection.cc
M src/kudu/rpc/transfer.cc
M src/kudu/rpc/transfer.h
3 files changed, 89 insertions(+), 45 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/24176/1
--
To view, visit http://gerrit.cloudera.org:8080/24176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idf7ab105a851ef4d583efc2d1b33d57607810df0
Gerrit-Change-Number: 24176
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>