On Wed, 14 Apr 2021 06:48:40 GMT, Robbin Ehn <r...@openjdk.org> wrote:
>> A suspend request is done by handshaking thread target thread(s). When >> executing the handshake operation we know the target mutator thread is in a >> dormant state (as in safepoint safe state). We have a guarantee that it will >> check it's poll before leaving the dormant state. To stop the thread from >> leaving the the dormant state we install a second asynchronous handshake to >> be executed by the targeted thread. The asynchronous handshake will wait on >> a monitor while the thread is suspended. The target thread cannot not leave >> the dormant state without a resume request. >> >> Per thread suspend requests are naturally serialized by the per thread >> HandshakeState lock (we can only execute one handshake at a time per thread). >> Instead of having a separate lock we use this to our advantage and use >> HandshakeState lock for serializing access to the suspend flag and for >> wait/notify. >> >> Suspend: >> Requesting thread -> synchronous handshake -> target thread >> Inside synchronus handshake (HandshakeState lock is locked while >> executing any handshake): >> - Set suspended flag >> - Install asynchronous handshake >> >> Target thread -> tries to leave dormant state -> Executes handshakes >> Target only executes asynchronous handshake: >> - While suspended >> - Go to blocked >> - Wait on HandshakeState lock >> >> Resume: >> Resuming thread: >> - Lock HandshakeState lock >> - Clear suspended flag >> - Notify HandshakeState lock >> - Unlock HandshakeState lock >> >> The "suspend requested" flag is an optimization, without it a dormant thread >> could be suspended and resumed many times and each would add a new >> asynchronous handshake. Suspend requested flag means there is already an >> asynchronous suspend handshake in queue which can be re-used, only the >> suspend flag needs to be set. >> >> ---- >> Some code can be simplified or done in a smarter way but I refrained from >> doing such changes instead tried to keep existing code as is as far as >> possible. This concerns especially raw monitors. >> >> ---- >> Regarding the changed test, the documentation says: >> "If the calling thread is specified in the request_list array, this function >> will not return until some other thread resumes it." >> >> But the code: >> LOG("suspendTestedThreads: before JVMTI SuspendThreadList"); >> err = jvmti->SuspendThreadList(threads_count, threads, results); >> ... >> // Allow the Main thread to inspect the result of tested threads >> suspension >> agent_unlock(jni); >> >> The thread will never return from SuspendThreadList until resumed, so it >> cannot unlock with agent_unlock(). >> Thus main thread is stuck forever on: >> // Block until the suspender thread competes the tested threads suspension >> agent_lock(jni); >> >> And never checks and resumes the threads. So I removed that lock instead >> just sleep and check until all thread have the expected suspended state. >> >> ---- >> >> This version already contains updates after pre-review comments from >> @dcubed-ojdk, @pchilano, @coleenp. >> (Pre-review comments here: >> https://github.com/openjdk/jdk/pull/2625) >> >> ---- >> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and >> combinations like running with -XX:ZCollectionInterval=0.01 - >> XX:ZFragmentationLimit=0. >> Running above some of above concurrently (load ~240), slow debug, >> etc... > > Robbin Ehn has updated the pull request incrementally with one additional > commit since the last revision: > > Fixed flag undef dep + spelling error Reviewed v07 incremental. Still looks good. Update: Did a crawl thru review of v07 full from the bottom up. I'm hoping the change in review order results in less "change blindness" since I've reviewed so many times. There are a few left overs that still need to be address. There are a couple buried in "resolved comments" where it looks like the issue that I raised is not actually resolved. src/hotspot/share/runtime/handshake.hpp line 99: > 97: // but we need to check for a safepoint before. > 98: // (this is due to a suspension handshake which put the JavaThread in > blocked > 99: // state so a safepoint may be in-progress) nit: s/(this/(This/ nit: s/in-progress)/in-progress.)/ src/hotspot/share/runtime/handshake.hpp line 166: > 164: // thread gets suspended again (after a resume) > 165: // and we have not yet processed it. > 166: bool _async_suspend_handshake; Does this need to be `volatile`? src/hotspot/share/runtime/handshake.hpp line 177: > 175: void set_suspended(bool to) { return > Atomic::store(&_suspended, to); } > 176: bool has_async_suspend_handshake() { return > _async_suspend_handshake; } > 177: void set_async_suspend_handshake(bool to) { _async_suspend_handshake = > to; } Does this need to be an Atomic::store? src/hotspot/share/runtime/objectMonitor.cpp line 424: > 422: // thread that suspended us. > 423: _recursions = 0; > 424: _succ = NULL; You might want to add a comment here: // Don't need a full fence after clearing successor here because of the call to exit(). ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3191