Todd Lipcon has posted comments on this change. Change subject: kernel_stack_watchdog: avoid blocking threads starting ......................................................................
Patch Set 2: (2 comments) To find the root causes I was basically just looking at gstacks and adding LOG_IF_SLOW calls in various places, nothing too fancy. http://gerrit.cloudera.org:8080/#/c/4626/2//COMMIT_MSG Commit Message: PS2, Line 12: TSAN defers signal-handling > Just so I understand, what you mean is that TSAN handles the signal but tak yea, I did some "LOG_IF_SLOW" on the Register(TLS) function and found that it was sometimes blocked for 100+ ms, and usually at the same time as the watchdog was attempting to dump some stack. PS2, Line 21: However, it's still important to prevent these : threads from _exiting_ while we are looking at their TLS > But presumably delaying Thread.Join() has little to no effect on test flaki yea, there's a comment in the code to that effect. Thread _exits_ are basically never on a critical path, whereas thread creation often is (eg starting threadpool workers) -- To view, visit http://gerrit.cloudera.org:8080/4626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7af85ade6ec9050843ec5b146d26c2549c503d8f Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
