A suspend request is done by handshaking thread target thread(s). When executing the handshake operation we know the target mutator thread is in a dormant state (as in safepoint safe state). We have a guarantee that it will check it's poll before leaving the dormant state. To stop the thread from leaving the the dormant state we install a second asynchronous handshake to be executed by the targeted thread. The asynchronous handshake will wait on a monitor while the thread is suspended. The target thread cannot not leave the dormant state without a resume request.
Per thread suspend requests are naturally serialized by the per thread HandshakeState lock (we can only execute one handshake at a time per thread). Instead of having a separate lock we use this to our advantage and use HandshakeState lock for serializing access to the suspend flag and for wait/notify. Suspend: Requesting thread -> synchronous handshake -> target thread Inside synchronus handshake (HandshakeState lock is locked while executing any handshake): - Set suspended flag - Install asynchronous handshake Target thread -> tries to leave dormant state -> Executes handshakes Target only executes asynchronous handshake: - While suspended - Go to blocked - Wait on HandshakeState lock Resume: Resuming thread: - Lock HandshakeState lock - Clear suspended flag - Notify HandshakeState lock - Unlock HandshakeState lock The "suspend requested" flag is an optimization, without it a dormant thread could be suspended and resumed many times and each would add a new asynchronous handshake. Suspend requested flag means there is already an asynchronous suspend handshake in queue which can be re-used, only the suspend flag needs to be set. ---- Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors. ---- Regarding the changed test, the documentation says: "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it." But the code: LOG("suspendTestedThreads: before JVMTI SuspendThreadList"); err = jvmti->SuspendThreadList(threads_count, threads, results); ... // Allow the Main thread to inspect the result of tested threads suspension agent_unlock(jni); The thread will never return from SuspendThreadList until resumed, so it cannot unlock with agent_unlock(). Thus main thread is stuck forever on: // Block until the suspender thread competes the tested threads suspension agent_lock(jni); And never checks and resumes the threads. So I removed that lock instead just sleep and check until all thread have the expected suspended state. ---- This version already contains updates after pre-review comments from @dcubed-ojdk, @pchilano, @coleenp. (Pre-review comments here: https://github.com/openjdk/jdk/pull/2625) ---- Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and combinations like running with -XX:ZCollectionInterval=0.01 - XX:ZFragmentationLimit=0. Running above some of above concurrently (load ~240), slow debug, etc... ------------- Commit messages: - 8257831: Suspend with handshake (review baseline) Changes: https://git.openjdk.java.net/jdk/pull/3191/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3191&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257831 Stats: 1315 lines in 38 files changed: 271 ins; 855 del; 189 mod Patch: https://git.openjdk.java.net/jdk/pull/3191.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3191/head:pull/3191 PR: https://git.openjdk.java.net/jdk/pull/3191