Dan, I did not do a complete review - I have been following the conversation and you all seem to have this completely in hand so I didn't jump in. Yell if need more.
thanks, Karen On Sep 2, 2015, at 3:27 PM, Daniel D. Daugherty wrote: > On 9/2/15 12:50 PM, Karen Kinnear wrote: >> Dan, >> >> Can you possibly change the two "in a parallel" to "in parallel" ? > > Nice catch. Will fix. > > Did you do a complete review or did that glaring typo just > jump out at you? > > Dan > > >> >> thanks, >> Karen >> >> On Sep 2, 2015, at 1:06 PM, Tom Benson wrote: >> >>> Looks good to me! >>> Tnx, >>> Tom >>> >>> On 9/2/2015 12:40 PM, Daniel D. Daugherty wrote: >>>> Just for the record, here are the comment context diffs: >>>> >>>> $ diff -c src/share/vm/runtime/perfMemory.cpp{.cr2,}*** >>>> src/share/vm/runtime/perfMemory.cpp.cr2 Tue Sep 1 19:39:45 2015 >>>> --- src/share/vm/runtime/perfMemory.cpp Wed Sep 2 09:35:48 2015 >>>> *************** >>>> *** 70,76 **** >>>> // objects that are currently being used by running JavaThreads >>>> // or the StatSampler. This method is invoked while we are not at >>>> // a safepoint during a VM abort so leaving the PerfData objects >>>> ! // around may also help diagnose the failure. >>>> // >>>> if (SafepointSynchronize::is_at_safepoint() && >>>> !StatSampler::is_active()) { >>>> PerfDataManager::destroy(); >>>> --- 70,78 ---- >>>> // objects that are currently being used by running JavaThreads >>>> // or the StatSampler. This method is invoked while we are not at >>>> // a safepoint during a VM abort so leaving the PerfData objects >>>> ! // around may also help diagnose the failure. In rare cases, >>>> ! // PerfData objects are used in parallel with a safepoint. See >>>> ! // the work around in PerfDataManager::destroy(). >>>> // >>>> if (SafepointSynchronize::is_at_safepoint() && >>>> !StatSampler::is_active()) { >>>> PerfDataManager::destroy(); >>>> >>>> >>>> $ diff -c src/share/vm/runtime/objectMonitor.cpp{.cr2,} >>>> *** src/share/vm/runtime/objectMonitor.cpp.cr2 Tue Sep 1 19:23:35 2015 >>>> --- src/share/vm/runtime/objectMonitor.cpp Wed Sep 2 09:37:08 2015 >>>> *************** >>>> *** 572,577 **** >>>> --- 572,579 ---- >>>> // That is by design - we trade "lossy" counters which are exposed to >>>> // races during updates for a lower probe effect. >>>> TEVENT(Inflated enter - Futile wakeup); >>>> + // This PerfData object can be used in a parallel with a safepoint. >>>> + // See the work around in PerfDataManager::destroy(). >>>> OM_PERFDATA_OP(FutileWakeups, inc()); >>>> ++nWakeups; >>>> >>>> *************** >>>> *** 744,749 **** >>>> --- 746,753 ---- >>>> // *must* retry _owner before parking. >>>> OrderAccess::fence(); >>>> >>>> + // This PerfData object can be used in a parallel with a safepoint. >>>> + // See the work around in PerfDataManager::destroy(). >>>> OM_PERFDATA_OP(FutileWakeups, inc()); >>>> } >>>> >>>> >>>> Dan >>>> >>>> >>>> On 9/2/15 10:03 AM, Daniel D. Daugherty wrote: >>>>> On 9/2/15 9:40 AM, Tom Benson wrote: >>>>>> Hi Dan, >>>>>> OK. I didn't review what was added in round 1 once you said it was all >>>>>> removed for round 2. 8^) >>>>> Not "all", but I did remove "most" of the round 1 changes :-) >>>>> The changes I kept are called in the list below. >>>>> >>>>> >>>>>> It would be great if what you have in your first paragraph below was >>>>>> added to the comments. I think the existing comment in perfMemory_exit >>>>>> implies we're safe to remove the PerfData objects without fear of them >>>>>> being in use because we're at a safepoint. >>>>> I think I'll add this sentence to the comment in perfMemory_exit(): >>>>> >>>>> // In rare cases, PerfData objects are used in parallel with a >>>>> // safepoint. See the work around in PerfDataManager::destroy(). >>>>> >>>>> >>>>>> Perhaps better to have it (the new comment) in >>>>>> PerfDataManager::destroy(), because it seems so weird to have a sleep >>>>>> in the VM thread during a safepoint, even in a shutdown path. >>>>> I think the PerfDataManager::destroy() comment is clear about >>>>> the race we're trying avoid. Again, if you have specific wording >>>>> changes to suggest to make it more clear... I'll take them. :-) >>>>> >>>>> I think I'll also add this comment: >>>>> >>>>> // This PerfData object can be used in a parallel with a safepoint. >>>>> // See the work around in PerfDataManager::destroy(). >>>>> >>>>> above these lines in src/share/vm/runtime/objectMonitor.cpp: >>>>> >>>>> 575 OM_PERFDATA_OP(FutileWakeups, inc()); >>>>> 747 OM_PERFDATA_OP(FutileWakeups, inc()); >>>>> >>>>> >>>>>> Any interest in asserting that you're at a safepoint in >>>>>> PerfDataManager::destroy? Just a thought. >>>>> I'd rather not add an assert() at this time. >>>>> >>>>> Are you good with the above comment additions? Do you need to >>>>> see another webrev when I make those changes? >>>>> >>>>> Dan >>>>> >>>>> >>>>>> Tom >>>>>> >>>>>> On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote: >>>>>>> On 9/2/15 8:49 AM, Tom Benson wrote: >>>>>>>> Hi, >>>>>>>> I'm a bit confused on one point... Since you now only call >>>>>>>> PerfDataManager::destroy at a safepoint, why do you still have the >>>>>>>> comment about 'the race' and the sleep? >>>>>>> Because the two "futile wakeup" counter updates in the monitor >>>>>>> subsystem can execute in parallel with a safepoint. The JavaThread >>>>>>> state is "blocked" so the safepoint subsystem will see the JavaThread >>>>>>> as "at a safepoint" when it is actually executing the code to >>>>>>> increment the counter. >>>>>>> >>>>>>> That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was >>>>>>> all about in the round 1 code review. However, David convinced me >>>>>>> that all that logic didn't guarantee we wouldn't hit the race so >>>>>>> I ripped it all out in the round 2 code review (this one). >>>>>>> >>>>>>> Does this help your confusion? >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>>> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote: >>>>>>>>> David, >>>>>>>>> >>>>>>>>> Thanks for the very fast re-review! >>>>>>>>> >>>>>>>>> Enjoy your vacation! >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/2/15 2:54 AM, David Holmes wrote: >>>>>>>>>> Hi Dan, >>>>>>>>>> >>>>>>>>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote: >>>>>>>>>>> Greetings, >>>>>>>>>>> >>>>>>>>>>> I've updated the "fix" for this bug based on code review comments >>>>>>>>>>> received in round 1. I've dropped most of the changes from round 1 >>>>>>>>>>> with a couple of exceptions. >>>>>>>>>> I have no further comments - it all looks good to me. If others want >>>>>>>>>> the pendulum to swing back a little from this position then ... >>>>>>>>>> nothing that has been suggested is functionally wrong. :) >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> PS. When you get back from vacation I'll be gone for a month. That >>>>>>>>>> gives you a large window to push other things through with less >>>>>>>>>> stress ;-) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc() >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304 >>>>>>>>>>> >>>>>>>>>>> Webrev URL: >>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/ >>>>>>>>>>> >>>>>>>>>>> The easiest way to re-review is to download the two patch files >>>>>>>>>>> (round 0 and round 2) and view them in your favorite file merge >>>>>>>>>>> tool: >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process) >>>>>>>>>>> Aurora Adhoc vm.tmtools batch (in process) >>>>>>>>>>> Kim's repro sequence for JDK-8049304 >>>>>>>>>>> Kim's repro sequence for JDK-8129978 >>>>>>>>>>> JPRT -testset hotspot >>>>>>>>>>> >>>>>>>>>>> Changes between round 0 and round 2: >>>>>>>>>>> >>>>>>>>>>> - clarify a few comments >>>>>>>>>>> - init _has_PerfData flag with '0' (instead of false) >>>>>>>>>>> - drop unnecessary use OrderAccess::release_store() to set >>>>>>>>>>> _has_PerfData to '1' (we're in a Mutex) >>>>>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy() >>>>>>>>>>> when called at a safepoint and when the StatSampler is not >>>>>>>>>>> running; this means when the VM is aborting, we no longer have >>>>>>>>>>> a race between the original crash report and this code path. >>>>>>>>>>> >>>>>>>>>>> Changes between round 1 and round 2: >>>>>>>>>>> >>>>>>>>>>> - clarify a few comments >>>>>>>>>>> - drop is_safe parameter to OM_PERFDATA_OP macro >>>>>>>>>>> - init _has_PerfData flag with '0' (instead of false) >>>>>>>>>>> - drop OrderAccess::fence() call before os::naked_short_sleep() call >>>>>>>>>>> - drop PerfDataManager::has_PerfData_with_acquire() >>>>>>>>>>> - drop unnecessary use OrderAccess::release_store() to set >>>>>>>>>>> _has_PerfData to '1' (we're in a Mutex) >>>>>>>>>>> >>>>>>>>>>> I believe that I've addressed all comments from round 0 and >>>>>>>>>>> from round 1. >>>>>>>>>>> >>>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote: >>>>>>>>>>>> Greetings, >>>>>>>>>>>> >>>>>>>>>>>> I've updated the "fix" for this bug based on code review comments >>>>>>>>>>>> received in round 0. >>>>>>>>>>>> >>>>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc() >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304 >>>>>>>>>>>> >>>>>>>>>>>> Webrev URL: >>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/ >>>>>>>>>>>> >>>>>>>>>>>> The easiest way to re-review is to download the two patch files >>>>>>>>>>>> and view them in your favorite file merge tool: >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process) >>>>>>>>>>>> Aurora Adhoc vm.tmtools batch (in process) >>>>>>>>>>>> Kim's repro sequence for JDK-8049304 >>>>>>>>>>>> Kim's repro sequence for JDK-8129978 >>>>>>>>>>>> JPRT -testset hotspot >>>>>>>>>>>> >>>>>>>>>>>> Changes between round 0 and round 1: >>>>>>>>>>>> >>>>>>>>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro; >>>>>>>>>>>> safepoint-safe callers can access _has_PerfData flag directly; >>>>>>>>>>>> non-safepoint-safe callers use a load-acquire to fetch the >>>>>>>>>>>> current _has_PerfData flag value >>>>>>>>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData >>>>>>>>>>>> to zero (field is volatile) and then use a fence() to prevent >>>>>>>>>>>> any reordering of operations in any direction; it's only done >>>>>>>>>>>> once during VM shutdown so... >>>>>>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy() >>>>>>>>>>>> when called at a safepoint and when the StatSample is not >>>>>>>>>>>> running; this means when the VM is aborting, we no longer have >>>>>>>>>>>> a race between the original crash report and this code path. >>>>>>>>>>>> >>>>>>>>>>>> I believe that I've addressed all comments from round 0. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote: >>>>>>>>>>>>> Greetings, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a "fix" for a long standing race between JVM shutdown and >>>>>>>>>>>>> the >>>>>>>>>>>>> JVM statistics subsystem: >>>>>>>>>>>>> >>>>>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc() >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304 >>>>>>>>>>>>> >>>>>>>>>>>>> Webrev URL: >>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/ >>>>>>>>>>>>> >>>>>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch >>>>>>>>>>>>> Aurora Adhoc vm.tmtools batch >>>>>>>>>>>>> Kim's repro sequence for JDK-8049304 >>>>>>>>>>>>> Kim's repro sequence for JDK-8129978 >>>>>>>>>>>>> JPRT -testset hotspot >>>>>>>>>>>>> >>>>>>>>>>>>> This "fix": >>>>>>>>>>>>> >>>>>>>>>>>>> - adds a volatile flag to record whether PerfDataManager is >>>>>>>>>>>>> holding >>>>>>>>>>>>> data (PerfData objects) >>>>>>>>>>>>> - adds PerfDataManager::has_PerfData() to return the flag >>>>>>>>>>>>> - changes the Java monitor subsystem's use of PerfData to >>>>>>>>>>>>> check both allocation of the monitor subsystem specific >>>>>>>>>>>>> PerfData object and the new PerfDataManager::has_PerfData() >>>>>>>>>>>>> return value >>>>>>>>>>>>> >>>>>>>>>>>>> If the global 'UsePerfData' option is false, the system works as >>>>>>>>>>>>> it did before. If 'UsePerfData' is true (the default on >>>>>>>>>>>>> non-embedded >>>>>>>>>>>>> systems), the Java monitor subsystem will allocate a number of >>>>>>>>>>>>> PerfData objects to record information. The objects will record >>>>>>>>>>>>> information about Java monitor subsystem until the JVM shuts down. >>>>>>>>>>>>> >>>>>>>>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag will >>>>>>>>>>>>> change to false and the Java monitor subsystem will stop using the >>>>>>>>>>>>> PerfData objects. This is the new behavior. As noted in the >>>>>>>>>>>>> comments >>>>>>>>>>>>> I added to the code, the race is still present; I'm just changing >>>>>>>>>>>>> the order and the timing to reduce the likelihood of the crash. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>> >