Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5741

to look at the new patch set (#9).

Change subject: Control mutex stack walking in DEBUG mode with a gflag
......................................................................

Control mutex stack walking in DEBUG mode with a gflag

This patch disables the Mutex owner stack trace collection on DEBUG
builds by default, only enabling it when a certain gflag is set.

In DEBUG mode, our Mutex implementation collects a stack trace of the
owning thread each time the Mutex is acquired. It does this by calling
google::GetStackTrace() from glog, which in the context of the Kudu
build environment calls into libunwind to collect that stack trace.

At the time of writing, google::GetStackTrace() only allows access by
one thread at a time. If more than one thread attempts to invoke this
function simultaneously, there is a CAS that determines excluvisity. The
"loser" of this contest gets a short-circuit return along with an empty
stack trace, indicating a failure to collect the stack trace.

NB: I have filed a glog issue about that behavior upstream. For more
information, see https://github.com/google/glog/issues/160

This situation becomes a problem when there are one or more Mutexes
constantly being acquired. When that happens, there is always a thread
collecting a stack trace, and so the probability of being able to
successfully collect a stack trace at any given moment is greatly
reduced.

One important caller of google::GetStackTrace() is the glog failure
function and SIGABRT signal handler that is called when a CHECK() fails
or a LOG(FATAL) call is invoked. I have observed that this crash handler
will often print an empty stack trace in DEBUG mode. Investigating this
issue led me to discover that we had a thread (our AsyncLogger thread)
constantly acquiring a Mutex and racing on the above-mentioned CAS check
inside google::GetStackTrace(). Depriving our DEBUG builds of stack
traces on LOG(FATAL) or CHECK failures, especially on Jenkins runs, is
counterproductive. One simple solution to this problem is to disable
this behavior by default.

Change-Id: Ie4593cf7173867ce2f6151e03df0be94f97d95d2
---
M src/kudu/util/mutex.cc
M src/kudu/util/mutex.h
2 files changed, 37 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/5741/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4593cf7173867ce2f6151e03df0be94f97d95d2
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to