Code looks good. Thx for the local in person review. Thx Ron
> -----Original Message----- > From: Daniel D. Daugherty > Sent: Tuesday, February 25, 2014 9:04 AM > To: David Holmes; Serguei Spitsyn; Dave Dice; Karen Kinnear > Cc: serviceability-dev@openjdk.java.net; hotspot-runtime-...@openjdk.java.net > Subject: Re: RFR round 0 JDK8u backport of ObjectMonitor-JVM/TI hang fix > (8028073) > > Ping! Still haven't heard from anyone on this backport... > > Dan > > > On 2/21/14 8:40 PM, Daniel D. Daugherty wrote: > > Greetings, > > > > This is a code review request for the JDK8u-hs-dev backport of the > > following ObjectMonitor-JVM/TI hang fix: > > > > 8028073 race condition in ObjectMonitor implementation causing > > deadlocks > > https://bugs.openjdk.java.net/browse/JDK-8028073 > > > > Here is the JDK8u-hs-dev webrev URL: > > > > http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk8u-hs-dev/ > > > > This is _almost_ a straight forward backport of the JDK9 fix. The only > > difference to the fix was discussed at the end of the JDK9 review and > > was determined to only be needed in versions of HotSpot without the > > fix for 8028280: > > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010745.html > > > > > > 8028280 has not yet been backported to JDK8u-hs-dev. > > > > The easiest way to review the backport is to download the two patch > > files from the webrevs and compare them with something like: > > > > jfilemerge -r -w 8028073_exp.patch 8028073_exp_for_jdk8u_hs.patch > > > > The same testing has been performed on the JDK8u-hs-dev version as > > with the JDK9-hs-runtime version. > > > > Thanks, in advance, for any comments, questions or suggestions. > > > > Dan > > > > > > On 2/1/14 11:38 AM, Daniel D. Daugherty wrote: > > > Greetings, > > > > > > I have a fix ready for the following bug: > > > > > > 8028073 race condition in ObjectMonitor implementation causing > > deadlocks > > > https://bugs.openjdk.java.net/browse/JDK-8028073 > > > > > > On the surface, this is a very simple fix that relocates a few lines of > > > code, relocates and rewrites the comments associated with that code and > > > adds several new comments. > > > > > > Of course, in reality, the issue is much more complicated, but I'm > > > hoping to make it easy for anyone not acquainted with this issue to > > > understand what's going on. > > > > > > Here are the JDK9 webrev URLs: > > > > > > OpenJDK: > > > http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk9-hs-runtime/ > > > > > > Oracle internal: > > > > > http://javaweb.us.oracle.com/~ddaugher/8028073-webrev/0-jdk9-hs-runtime/ > > > > > > The simple summary: > > > > > > - since Java Monitors and JVM/TI RawMonitors share a ParkEvent, > > > it is possible for a JVM/TI monitor event handler to accidentally > > > consume a ParkEvent.unpark() call meant for Java Monitor layer > > > - the original code fix was made on 2005.07.04 using this bug ID: > > > https://bugs.openjdk.java.net/browse/JDK-5030359 > > > - it's the right fix, but it's in the wrong place > > > - the fix needs to be after the JVMTI_EVENT_MONITOR_WAITED > > > event handler is called because it is that event handler > > > that can cause the hang > > > > > > > > > Testing > > > ------- > > > > > > - a new StessMonitorWait test has been created that reliably > > > reproduces the hang in JDK[6789]; see the bug's gory details > > > for the specific versions where the hang has been reproduced > > > - the test reliably reproduces the hang in 5 seconds on my > > > T7600 running Solaris 10u11 X86; 1 minute runs reproduce > > > the hang reliably on other machines > > > - 12 hour stress run of the new test on Linux-X64, MacOS X-X64, > > > Solaris-SPARCV9, Solaris-X64, and Win7-X86 with the JPRT > > > bits did not reproduce the hang > > > - JPRT test job > > > - VM/SQE Adhoc test job on Server VM, fastdebug bits on Linux-X86, > > > Linux-X64, MacOS X-X64, Solaris-SPARCV9, Solaris-X64, Windows-X86, > > > and Windows-X64: > > > - vm.quick > > > - Kitchensink (bigapps) > > > - Weblogic+medrec (bigapps) > > > - runThese (bigapps) > > > > > > > > > The Gory Details Start Here > > > --------------------------- > > > > > > This is the old location of block of code that's being moved: > > > > > > src/share/vm/runtime/objectMonitor.cpp: > > > > > > 1440 void ObjectMonitor::wait(jlong millis, bool interruptible, > > TRAPS) { > > > <snip> > > > 1499 exit (true, Self) ; // exit the monitor > > > <snip> > > > 1513 if (node._notified != 0 && _succ == Self) { > > > 1514 node._event->unpark(); > > > 1515 } > > > > > > > > > This is the new location of block of code that's being moved: > > > > > > src/share/vm/runtime/objectMonitor.cpp: > > > > > > 1452 void ObjectMonitor::wait(jlong millis, bool interruptible, > > TRAPS) { > > > <snip> > > > 1601 if (JvmtiExport::should_post_monitor_waited()) { > > > 1602 JvmtiExport::post_monitor_waited(jt, this, ret == > > OS_TIMEOUT); > > > <snip> > > > 1604 if (node._notified != 0 && _succ == Self) { > > > <snip> > > > 1620 node._event->unpark(); > > > 1621 } > > > > > > > > > The Risks > > > --------- > > > > > > - The code now executes only when the JVMTI_EVENT_MONITOR_WAITED event > > > is enabled: > > > - previously it was always executed > > > - while the old code was not effective for the hang that is being > > > fixed with this bug, it is possible that the old code prevented > > > a different bug in the successor protocol from manifesting > > > - thorough analysis of the successor protocol did not reveal a > > > case where the old code was needed in the old location > > > - Thorough analysis indicates that the other JVM/TI monitor events > > > do not need a fix like the one for JVMTI_EVENT_MONITOR_WAITED: > > > - the successor protocol is complicated and the analysis could > > > be wrong when certain options are used > > > - comments were added to each location where a JVM/TI monitor > > > event handler is called documenting why a fix like this one > > > is not needed there > > > - if the analysis is wrong, the new comments show where a new > > > code change would be needed > > > > > > > > > The Scenario > > > ------------ > > > > > > I've created a scenario that reproduces this hang: > > > > > > T1 - enters monitor and calls monitor.wait() > > > T2 - enters the monitor, calls monitor.notify() and exits the monitor > > > T3 - enters and exits the monitor > > > T4 - enters the monitor, delays for 5 seconds, exits the monitor > > > > > > A JVM/TI agent that enables JVMTI_EVENT_MONITOR_WAITED and has a > > > handler that: enters a raw monitor, waits for 1ms, exits a raw monitor. > > > > > > Here are the six events necessary to make this hang happen: > > > > > > // KEY-EVENT-1a: After being unparked(), T1 has cleared the _succ > > field, but > > > // KEY-EVENT-1b: T3 is exiting the monitor and makes T1 the > > successor again. > > > > > > // KEY-EVENT-2a: The unpark() done by T3 when it made T1 the successor > > > // KEY-EVENT-2b: is consumed by the JVM/TI event handler. > > > > > > // KEY-EVENT-3a: T3 made T1 the successor > > > // KEY-EVENT-3b: but before T1 could reenter the monitor T4 grabbed it. > > > > > > // KEY-EVENT-4a: T1's TrySpin() call sees T4 as NotRunnable so > > > // KEY-EVENT-4b: T1 bails from TrySpin without touching _succ. > > > > > > // KEY-EVENT-5a: T4 sees that T1 is still the successor so > > > // KEY-EVENT-5b: T4 takes the quick exit path (no ExitEpilog) > > > > > > // KEY-EVENT-6a: T1 is about to park and it is the successor, but > > > // KEY-EVENT-6b: T3's unpark has been eaten by the JVM/TI event handler > > > // KEY-EVENT-6c: and T4 took the quick exit path. T1 is about to be > > stuck. > > > > > > > > > This bug is intertwined with: > > > > > > - The ObjectMonitor successor protocol > > > - the sharing of a ParkEvent between Java Monitors and JVM/TI > > RawMonitors > > > > > > There is a very long successor.notes attachment to JDK-8028073 that > > > attempts to describe the ObjectMonitor successor protocol. It's good > > > for putting pretty much anyone to sleep. > > > > > > Since this hang reproduces back to JDK6, this bug is taking the easily > > > backported solution of moving the original fix to the right location. > > > The following new bug has been filed for possible future work in this > > > area by the Serviceability Team: > > > > > > 8033399 add a separate ParkEvent for JVM/TI RawMonitor use > > > https://bugs.openjdk.java.net/browse/JDK-8033399 > > > > > > > > > The Symptoms > > > ------------ > > > > > > With intermittent hangs like this, it is useful to know what to look > > > for in order to determine if you are running into this issue: > > > > > > - if you aren't using a debugger or a profiler or some other > > > JVM/TI agent, then this hang is not the same as yours > > > - if your JVM/TI agent isn't using a JVMTI_EVENT_MONITOR_WAITED > > > event handler, then this hang is not the same as yours > > > - if your JVMTI_EVENT_MONITOR_WAITED event handler is not using > > > JVM/TI RawMonitors, then this hang is not the same as yours > > > - if your JVMTI_EVENT_MONITOR_WAITED event handler is calling > > > back into Java code, then you might just be insane and this > > > hang might be similar to yours. However, using a Java callback > > > in an event handler is an even bigger problem/risk so fix that > > > first. > > > - if you one or more threads blocked like this and making no > > > progress, then this hang might be the same as yours: > > > > > > "T1" #22 prio=5 os_prio=64 tid=0x00000000009ca800 nid=0x2f waiting > > for monitor e > > > ntry [0xfffffd7fc0231000] > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > JavaThread state: _thread_blocked > > > Thread: 0x00000000009ca800 [0x2f] State: _at_safepoint > > _has_called_back 0 _at_p > > > oll_safepoint 0 > > > JavaThread state: _thread_blocked > > > at java.lang.Object.wait(Native Method) > > > - waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > at java.lang.Object.wait(Object.java:502) > > > at SMW_WorkerThread.run(StressMonitorWait.java:103) > > > - locked <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > "T2" #23 prio=5 os_prio=64 tid=0x00000000009cc000 nid=0x30 waiting > > for monitor e > > > ntry [0xfffffd7fc0130000] > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > JavaThread state: _thread_blocked > > > Thread: 0x00000000009cc000 [0x30] State: _at_safepoint > > _has_called_back 0 _at_p > > > oll_safepoint 0 > > > JavaThread state: _thread_blocked > > > at SMW_WorkerThread.run(StressMonitorWait.java:120) > > > - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > "T3" #24 prio=5 os_prio=64 tid=0x00000000009ce000 nid=0x31 waiting > > for monitor e > > > ntry [0xfffffd7fc002f000] > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > JavaThread state: _thread_blocked > > > Thread: 0x00000000009ce000 [0x31] State: _at_safepoint > > _has_called_back 0 _at_p > > > oll_safepoint 0 > > > JavaThread state: _thread_blocked > > > at SMW_WorkerThread.run(StressMonitorWait.java:139) > > > - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > Key symptoms in thread T1: > > > > > > - had the object locked: > > > > > > locked <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > - did an Object.wait(): > > > > > > waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > - is blocked on reentry: > > > > > > waiting for monitor entry [0xfffffd7fc0231000] > > > > > > Key symptoms in thread T2: > > > > > > - is blocked waiting to lock the object: > > > > > > waiting for monitor entry [0xfffffd7fc0130000] > > > waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > Key symptoms in thread T3: > > > > > > - is blocked waiting to lock the object: > > > > > > waiting for monitor entry [0xfffffd7fc002f000] > > > waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String) > > > > > > >