Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12739 )
Change subject: [subprocess] use RAW_LOG() in child process ...................................................................... Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/12739/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12739/1//COMMIT_MSG@16 PS1, Line 16: TODO: add a test scenario to simulate the deadlock and make sure : it passes with this patch Maybe a loop that calls Subprocess::Call() on a non-existent program, and another thread that loops on a LOG() call? http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/signal.cc File src/kudu/util/signal.cc: http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/signal.cc@43 PS1, Line 43: // We unblock all signal masks since they are inherited. Maybe note that the use of RAW_LOG is so that the function remains async-signal-safe? http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/signal.cc@48 PS1, Line 48: RAW_LOG(FATAL, "sigfillset() failed: [%i]", err); One of the other nice things that PCHECK/PLOG do is convert errno into a textual representation via glog's internal StrError (which uses strerror_r under the hood). Not saying we should necessarily do that here, just wanted to point it out. http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/subprocess.cc@377 PS1, Line 377: generic general http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/subprocess.cc@378 PS1, Line 378: In particular, trying : // to log anything via GLOG involves taking a lock on mutex that might be : // copied over from the parent in already locked state, so the child would : // deadlock while doing to log anything using LOG() macro Nit: reword as "Surprisingly, a call to LOG() locks a mutex that may have been copied from the parent's address space in an already locked state, so it is not async-signal-safe and can deadlock the child if called." http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/subprocess.cc@383 PS1, Line 383: it's not async-signal-safe : // strictly speaking The async-signal-safe documentation agrees with this, but I don't understand why: does vsnprintf have static state? http://gerrit.cloudera.org:8080/#/c/12739/1/src/kudu/util/subprocess.cc@385 PS1, Line 385: the drop 'the' -- To view, visit http://gerrit.cloudera.org:8080/12739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9dca4ca8b1a6d72c9fc818ea41109c80ace3e39 Gerrit-Change-Number: 12739 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 13 Mar 2019 04:07:00 +0000 Gerrit-HasComments: Yes
