Hello Kurt Deschler, Kudu Jenkins, Joe McDonnell, Csaba Ringhofer, Michael 
Smith,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24176

to look at the new patch set (#2).

Change subject: WIP [rpc] in-bulk memory recycling for 
Connection::ProcessOutboundTransfers()
......................................................................

WIP [rpc] in-bulk memory recycling for Connection::ProcessOutboundTransfers()

WIP:
  * collect the initial feedback
  * add a test/perf scenario if not covered by rpc-bench and similar?
  * provide perf report from rpc-bench (or from the newly added test)
    before and after the update

While troubleshooting RPC performance issues in a highly concurrent
workload, I noticed a pattern of lock contention in tcmalloc.  For more
context, the majority of RPCs had relatively large side-cars and
a single RPC connection often had a multitude of outgoing in-flight
transfers.  Also, at the OS/system level, the socket buffers were
configured to be of tens of megabytes in size.

Among the captured stack traces, multiple reactor threads in a single
pstack snapshot often had stack traces similar to the one below.

It seems the issue manifests itself when many allocations/deallocations
go through the tcmalloc's central free list, and the latter one is
guarded by a lock.  There might be multiple reasons why it happens [1],
but in any case it's better to avoid blocking I/O because of the lock
contention in tcmalloc.

This patch is an attempt to reduce lock contention by performing socket
I/O for all the outgoing transfers with pending data first, and
deallocating the memory after that in bulk.  In addition, it straightens
the memory ownership rules for the OutboundTransfer::callbacks_ field
and modernizes signatures of related methods to return std::unique_ptr
instead of raw pointers.

  #0  sys_futex (... <tcmalloc::Static::pageheap_lock_>)
  #1  base::internal::SpinLockDelay (...)
  ...
  #7  0x00000000038c7ee1 in tcmalloc::ThreadCache::ReleaseToCentralCache(...) ()
  #8  0x00000000038c8505 in tcmalloc::ThreadCache::ListTooLong(...) ()
  #9  0x00000000038d9103 in google::protobuf::internal::ArenaImpl::~ArenaImpl() 
()
  #10 0x0000000002043eec in google::protobuf::Arena::~Arena ()
  #11 kudu::rpc::InboundCall::~InboundCall (...)
  ...
  #16 kudu::rpc::ResponseTransferCallbacks::NotifyTransferFinished (...)
  #17 0x00000000020748be in kudu::rpc::OutboundTransfer::SendBuffer (...)
  #18 0x0000000002077de9 in kudu::rpc::Connection::ProcessOutboundTransfers 
(...)
  #19 0x0000000002078440 in kudu::rpc::Connection::QueueOutbound (...)
  #20 0x000000000207ade1 in kudu::rpc::QueueTransferTask::Run (...)
  #21 0x0000000002052b40 in kudu::rpc::ReactorThread::AsyncHandler (...)
  #22 0x000000000376e063 in ev_invoke_pending ()
  #23 0x000000000204f5a1 in kudu::rpc::ReactorThread::InvokePendingCb (...)
  #24 0x00000000037713b5 in ev_run ()
  #25 0x00000000020508eb in ev::loop_ref::run (...)
  #26 kudu::rpc::ReactorThread::RunThread (...)

[1] https://gperftools.github.io/gperftools/tcmalloc.html
Change-Id: Idf7ab105a851ef4d583efc2d1b33d57607810df0
---
M src/kudu/rpc/connection.cc
M src/kudu/rpc/transfer.cc
M src/kudu/rpc/transfer.h
3 files changed, 91 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/24176/2
--
To view, visit http://gerrit.cloudera.org:8080/24176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf7ab105a851ef4d583efc2d1b33d57607810df0
Gerrit-Change-Number: 24176
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to