Hi Daniel, test if I understand this correctly:
The original bug: we execute NestedThreadsListHandleInErrorHandlingTest, which is supposed to controlled-crash the VM and find SMR related info in hs-err file. But during error reporting, ThreadsSMRSupport::print_info_on() crashes, since internal SMR data structures are modified concurrently and ThreadsSMRSupport::print_info_on() does not lock. The solution was to unconditionally, in VMError::controlled_crash(), grab the Threads_lock. David spotted one path where this may be called recursively: when we test secondary signal handling, we call VMError::controlled_crash() from within the error handler. In that case, the second call to VMError::controlled_crash() will now, instead of crashing, trigger an assert, since we already own Threads_lock. This did not trip up SecondaryErrorTest.java since it was supposed to crash. Now it crashes with an assert, not a secondary signal. Which quietly castrates the test, btw - the reason we contributed this as a regression for running into blocked signals during signal handling, which is a bad thing. Converting this second signal to assert makes the test ineffective. So far correct? --- Yes, the secondary signal test is too coarse. In our port, we have embellished the "Error occurred during error reporting" message and print out the signal number and pc. But we have not yet brought that improvement upstream. I think your patch is okay. But now I wonder if it would be cleaner to try-lock-if-not-owned inside ThreadsSMRSupport::print_info_on() ? Or its caller, Thread::print_on_error()? Or above that, around that call in that one STEP inside VMError::report() ? Thanks, Thomas On Tue, Jun 26, 2018 at 6:17 PM, Daniel D. Daugherty <daniel.daughe...@oracle.com> wrote: > Thanks for the quick review Serguei! > > One more review would be nice... David H. or Thomas Stüfe? > > > On 6/26/18 12:12 PM, serguei.spit...@oracle.com wrote: >> >> Hi Dan, >> >> The fix looks reasonable to me. > > > Thanks. I like simple one liners... :-) > > >> Nice catch by David! > > > Definitely. > > Dan > > > >> >> Thanks, >> Serguei >> >> >> On 6/26/18 06:06, Daniel D. Daugherty wrote: >>> >>> Greetings, >>> >>> For this one liner review, I'd love to hear from David H., Thomas Stüfe, >>> and Serguei Spitsyn. >>> >>> David Holmes spotted a locking problem with my fix for the following bug: >>> >>> JDK-8205195 NestedThreadsListHandleInErrorHandlingTest fails because >>> hs_err doesn't contain _nested_thread_list_max >>> https://bugs.openjdk.java.net/browse/JDK-8205195 >>> >>> Webrev URL: >>> http://cr.openjdk.java.net/~dcubed/8205195-webrev/0-for-jdk-jdk/ >>> >>> The fix for the issue is a one liner that only grabs the Threads_lock >>> when the caller does not already own it: >>> >>> $ hg diff >>> diff -r 5698cf4e50f1 src/hotspot/share/utilities/vmError.cpp >>> --- a/src/hotspot/share/utilities/vmError.cpp Fri Jun 22 12:15:16 2018 >>> -0400 >>> +++ b/src/hotspot/share/utilities/vmError.cpp Mon Jun 25 21:20:52 2018 >>> -0400 >>> @@ -1703,7 +1703,7 @@ >>> // from racing with Threads::add() or Threads::remove() as we >>> // generate the hs_err_pid file. This makes our ErrorHandling tests >>> // more stable. >>> - MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag); >>> + MutexLockerEx ml(Threads_lock->owned_by_self() ? NULL : Threads_lock, >>> Mutex::_no_safepoint_check_flag); >>> >>> switch (how) { >>> case 1: vmassert(str == NULL, "expected null"); break; >>> >>> >>> Figuring out a way to detect the failure mode in order to test the >>> fix was much more involved than the fix itself. Gory details are >>> in the bug: >>> >>> JDK-8205648 fix for 8205195 breaks secondary error handling >>> https://bugs.openjdk.java.net/browse/JDK-8205648 >>> >>> No webrev since the one liner diff is shown above. This fix was tested >>> with the test/hotspot/jtreg/runtime/ErrorHandling tests on Linux-X64 >>> and with a special version of ErrorHandling/SecondaryErrorTest.java >>> that is documented in JDK-8205648. >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >> >> >