Dear Dan, See inline.
On Tue, Oct 20, 2015 at 7:29 AM, Daniel D. Daugherty < daniel.daughe...@oracle.com> wrote: > Carsten, > > It's been a long time! Thanks for the quick review... > It was a nice small change. Sorry, I am asking for a larger change. > Replies embedded below... > > On 10/19/15, 10:09 PM, Carsten Varming wrote: > > Dear Dan, > > I thought acquire/release semantics mandated that the release should be > paired with an acquire. > > > Yes, src/share/vm/runtime/orderAccess.hpp is pretty clear about > that. I modeled this fix after some other ObjectMonitor code > that doesn't (yet) follow that mandate. > > > In this case, the first line in verify_objmon_isinpool should be changed > from "PaddedEnd<ObjectMonitor> * block = (PaddedEnd<ObjectMonitor> > *)gBlockList" to use a load_acquire. Actually, it seems like there are a > lot of reads of gBlockList that should be replaced with a load_acquire in > synchronizer.cpp. > > > Yes, that would be a good change and cleanup. The really annoying > part is that the current fix seems to have resolved the bug... > > Current numbers: 700K iterations of 4 fastdebug instances in parallel > in 4.5 days. I typically saw one failure per day in a three day run... > I suspect the data dependency between "block = gBlockList" in the top of verify_objmon_isinpool and "block = block->FreeNext" at the bottom prevents the compiler from re-ordering the loads. Hence the assembly will be correct. Likewise the data dependency prevents the cpu from re-ordering the loads. Hence the program doesn't crash. Nevertheless, I encourage you to use the right acquire/release semantics. > > > Otherwise the change looks great. > > > Thanks! > You are welcome. Carsten > Dan > > > > Carsten > (not a reviewer). > > On Mon, Oct 19, 2015 at 8:02 PM, Daniel D. Daugherty < > daniel.daughe...@oracle.com> wrote: > >> Greetings, >> >> I have a fix for a long standing race between the lock-free ObjectMonitor >> verification code and the normal (locked) ObjectMonitor block allocation >> code path. For this fix, I would like at least a Runtime team reviewer >> and a Serviceability team reviewer. Thanks! >> >> JDK-8047212 runtime/ParallelClassLoading/bootstrap/random/inner-complex >> assert(ObjectSynchronizer::verify_objmon_isinpool(inf)) >> failed: >> monitor is invalid >> https://bugs.openjdk.java.net/browse/JDK-8047212 >> >> Webrev URL: >> http://cr.openjdk.java.net/~dcubed/8047212-webrev/0-jdk9-hs-rt/ >> >> Testing: Aurora Adhoc RT-SVC nightly batch >> 4 inner-complex fastdebug parallel runs for 4+ days and >> 600K iterations without seeing this failure; the experiment >> is still running; final results to be reported at the end >> of the review cycle >> JPRT -testset hotspot >> >> This fix: >> >> - makes ObjectMonitor::gBlockList volatile >> - uses "OrderAccess::release_store_ptr(&gBlockList, temp)" to >> make sure the new block updates _happen before_ gBlockList is >> changed to refer to the new block >> - add SA support for a "static pointer volatile" field like: >> >> static ObjectMonitor * volatile gBlockList; >> >> See the following link for a nice description of what "volatile" >> means in the different positions on a variable/parameter decl line: >> >> >> http://www.embedded.com/electronics-blogs/beginner-s-corner/4023801/Introduction-to-the-Volatile-Keyword >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >