On Tue, Jun 26, 2018 at 7:06 PM, Thomas Stüfe <thomas.stu...@gmail.com> wrote: > Hi Daniel, > > test if I understand this correctly: > > The original bug: we execute > NestedThreadsListHandleInErrorHandlingTest, which is supposed to > controlled-crash the VM and find SMR related info in hs-err file. But > during error reporting, ThreadsSMRSupport::print_info_on() crashes, > since internal SMR data structures are modified concurrently and > ThreadsSMRSupport::print_info_on() does not lock. The solution was to > unconditionally, in VMError::controlled_crash(), grab the > Threads_lock. > > David spotted one path where this may be called recursively: when we > test secondary signal handling, we call VMError::controlled_crash() > from within the error handler. In that case, the second call to > VMError::controlled_crash() will now, instead of crashing, trigger an > assert, since we already own Threads_lock. > > This did not trip up SecondaryErrorTest.java since it was supposed to > crash. Now it crashes with an assert, not a secondary signal. Which > quietly castrates the test, btw - the reason we contributed this as a > regression for running into blocked signals during signal handling, > which is a bad thing. Converting this second signal to assert makes > the test ineffective. > > So far correct? > > --- > > Yes, the secondary signal test is too coarse. In our port, we have > embellished the "Error occurred during error reporting" message and > print out the signal number and pc. But we have not yet brought that > improvement upstream. > > I think your patch is okay. > > But now I wonder if it would be cleaner to try-lock-if-not-owned > inside ThreadsSMRSupport::print_info_on() ? Or its caller, > Thread::print_on_error()? Or above that, around that call in that one > STEP inside VMError::report() ? >
just to be clear, I am fine with the patch in its current form, if you want to ship that. ..Thomas > Thanks, Thomas > > > On Tue, Jun 26, 2018 at 6:17 PM, Daniel D. Daugherty > <daniel.daughe...@oracle.com> wrote: >> Thanks for the quick review Serguei! >> >> One more review would be nice... David H. or Thomas Stüfe? >> >> >> On 6/26/18 12:12 PM, serguei.spit...@oracle.com wrote: >>> >>> Hi Dan, >>> >>> The fix looks reasonable to me. >> >> >> Thanks. I like simple one liners... :-) >> >> >>> Nice catch by David! >> >> >> Definitely. >> >> Dan >> >> >> >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/26/18 06:06, Daniel D. Daugherty wrote: >>>> >>>> Greetings, >>>> >>>> For this one liner review, I'd love to hear from David H., Thomas Stüfe, >>>> and Serguei Spitsyn. >>>> >>>> David Holmes spotted a locking problem with my fix for the following bug: >>>> >>>> JDK-8205195 NestedThreadsListHandleInErrorHandlingTest fails because >>>> hs_err doesn't contain _nested_thread_list_max >>>> https://bugs.openjdk.java.net/browse/JDK-8205195 >>>> >>>> Webrev URL: >>>> http://cr.openjdk.java.net/~dcubed/8205195-webrev/0-for-jdk-jdk/ >>>> >>>> The fix for the issue is a one liner that only grabs the Threads_lock >>>> when the caller does not already own it: >>>> >>>> $ hg diff >>>> diff -r 5698cf4e50f1 src/hotspot/share/utilities/vmError.cpp >>>> --- a/src/hotspot/share/utilities/vmError.cpp Fri Jun 22 12:15:16 2018 >>>> -0400 >>>> +++ b/src/hotspot/share/utilities/vmError.cpp Mon Jun 25 21:20:52 2018 >>>> -0400 >>>> @@ -1703,7 +1703,7 @@ >>>> // from racing with Threads::add() or Threads::remove() as we >>>> // generate the hs_err_pid file. This makes our ErrorHandling tests >>>> // more stable. >>>> - MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag); >>>> + MutexLockerEx ml(Threads_lock->owned_by_self() ? NULL : Threads_lock, >>>> Mutex::_no_safepoint_check_flag); >>>> >>>> switch (how) { >>>> case 1: vmassert(str == NULL, "expected null"); break; >>>> >>>> >>>> Figuring out a way to detect the failure mode in order to test the >>>> fix was much more involved than the fix itself. Gory details are >>>> in the bug: >>>> >>>> JDK-8205648 fix for 8205195 breaks secondary error handling >>>> https://bugs.openjdk.java.net/browse/JDK-8205648 >>>> >>>> No webrev since the one liner diff is shown above. This fix was tested >>>> with the test/hotspot/jtreg/runtime/ErrorHandling tests on Linux-X64 >>>> and with a special version of ErrorHandling/SecondaryErrorTest.java >>>> that is documented in JDK-8205648. >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >>> >>> >>