[kudu-CR] [util] use lighter locking scheme for ThreadMgr

Alexey Serbin (Code Review) Thu, 27 Dec 2018 16:12:43 -0800

Hello Will Berkeley, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit


    http://gerrit.cloudera.org:8080/12112

to look at the new patch set (#8).

Change subject: [util] use lighter locking scheme for ThreadMgr
......................................................................

[util] use lighter locking scheme for ThreadMgr

Use the rw_spinlock primitive to guard the ThreadMgr's registry
threads instead of Mutex.  Also, use the shared lock pattern to deal
with write and read access to the guarded entities of the ThreadMgr.
In addition, do not hold the lock for the whole duration of generating
the /threadz page for the embedded webserver.

With this patch, ThreadMgr now uses std::unordered_map as a container
for category-specific thread information and enforces stricter
consistency rules while removing thread information from its registry.
The process of adding an information about a thread into the registry
is not guarded by stricter consistency checks (i.e. it stays as it was
before this patch); see below for explanation.

NOTE:
  The stricter consistency around adding a thread into the ThreadMgr's
  registry caused issues with multiprocessing in Python tests,
  and I spent some time trying to work around that.  However, that was
  not fruitful.  I think the proper solution would be keeping the thread
  registry bound to some top-level object (like Kudu client or
  ServerBase object) and cleaning it up in accordance with the object's
  life cycle.

  In essence, the problem happens due to the combination of the
  following:
    * The way how workers are spawned by the multiprocessing.Pool,
      i.e. calling fork() at the point where Pool is instantiated.
    * The fact that pthread_t handles might be the same for threads
      in different processes.

  In detail, if multiprocessing.Pool is spawning worker processes after
  an instance of Kudu client has been created in the main test process,
  every worker gets a copy of the thread registry in its address space.
  The unexpected copy of the registry in a worker process is populated
  with the information on threads spawned due to the activity of the
  Kudu client in the main test process.  Later on, if a worker
  instantiates a Kudu client on its own, the newly spawned treads by
  the worker's Kudu client might have the same pthread_t handles
  as the threads whose information records are inadvertently inherited
  from the main test process.

  As an additional piece of joy, it turned out that it's impossible to
  handle worker's crash in a multiprocessing.Pool, and that's by design.
  When a worker process crashes while holding the call_queue.rlock,
  no other process will ever be able to read the call_queue anymore,
  breaking the Pool as it cannot communicate with its workers.

  BTW, in Python 3 the problem with the duplicated address space
  in worker processes has been resolved by using context and additional
  worker spawn modes (e.g., 'forkserver').

Change-Id: I4d49c1c39392e01c45019844430a4fe3d116c277
---
M src/kudu/util/os-util.cc
M src/kudu/util/thread.cc
2 files changed, 101 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/12112/8
--
To view, visit http://gerrit.cloudera.org:8080/12112
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d49c1c39392e01c45019844430a4fe3d116c277
Gerrit-Change-Number: 12112
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

[kudu-CR] [util] use lighter locking scheme for ThreadMgr

Reply via email to