Hello Mike Percy, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/9430 to review the following change. Change subject: KUDU-2325. Fix CHECK failure in RevokeSigData() ...................................................................... KUDU-2325. Fix CHECK failure in RevokeSigData() This fixes a race in the stack trace signaling mechanism: Target thread in HandleStackTraceSignal(): ----------------------------------------- - stack->Collect() - queued_to_tid = kNotInUse <context switched out> Tracer thread in RevokeSigData: ------------------------------- - see that the flag is not complete - exchange out queued_to_tid - assertions checked that queued_to_tid was either equal to the target tid or kDumpInProgress (because it was "not complete yet" it didn't expect kNotInUse) - mark complete This was fairly rare since RevokeSigData had to run in between the setting of 'queued_to_tid = kNotInUse' and the signalling of the completion flag. A new test added in this commit managed to trigger it pretty reliably by measuring the usual time taken for stack trace collection and setting timeouts to very close to that value. The fix is to simplify the state machine a bit so that the target thread doesn't ever transition back to 'kNotInUse'. I added a more detailed diagram of the state machine to clarify this. Change-Id: I419ffa462821c1e6d8f9f158f028b86d60f4fd54 --- M src/kudu/util/debug-util-test.cc M src/kudu/util/debug-util.cc 2 files changed, 111 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9430/1 -- To view, visit http://gerrit.cloudera.org:8080/9430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I419ffa462821c1e6d8f9f158f028b86d60f4fd54 Gerrit-Change-Number: 9430 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Mike Percy <mpe...@apache.org>