RFR: 8257831: Suspend with handshakes

Robbin Ehn Thu, 25 Mar 2021 06:19:55 -0700

A suspend request is done by handshaking thread target thread(s). When 
executing the handshake operation we know the target mutator thread is in a 
dormant state (as in safepoint safe state). We have a guarantee that it will 
check it's poll before leaving the dormant state. To stop the thread from 
leaving the the dormant state we install a second asynchronous handshake to be 
executed by the targeted thread. The asynchronous handshake will wait on a 
monitor while the thread is suspended. The target thread cannot not leave the 
dormant state without a resume request.


Per thread suspend requests are naturally serialized by the per thread 
HandshakeState lock (we can only execute one handshake at a time per thread).
Instead of having a separate lock we use this to our advantage and use 
HandshakeState lock for serializing access to the suspend flag and for 
wait/notify. 

Suspend:
Requesting thread -> synchronous handshake -> target thread
Inside synchronus handshake (HandshakeState lock is locked while
executing any handshake):
        - Set suspended flag
        - Install asynchronous handshake

Target thread -> tries to leave dormant state -> Executes handshakes
Target only executes asynchronous handshake:
        - While suspended
        - Go to blocked
        - Wait on HandshakeState lock

Resume:
Resuming thread:
        - Lock HandshakeState lock
        - Clear suspended flag
        - Notify HandshakeState lock
        - Unlock HandshakeState lock

The "suspend requested" flag is an optimization, without it a dormant thread 
could be suspended and resumed many times and each would add a new asynchronous 
handshake. Suspend requested flag means there is already an asynchronous 
suspend handshake in queue which can be re-used, only the suspend flag needs to 
be set.

----
Some code can be simplified or done in a smarter way but I refrained from doing 
such changes instead tried to keep existing code as is as far as possible. This 
concerns especially raw monitors.

----
Regarding the changed test, the documentation says:
"If the calling thread is specified in the request_list array, this function 
will not return until some other thread resumes it."

But the code:
  LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
  err = jvmti->SuspendThreadList(threads_count, threads, results);
  ...
  // Allow the Main thread to inspect the result of tested threads suspension   
  agent_unlock(jni);
  
The thread will never return from SuspendThreadList until resumed, so it cannot 
unlock with agent_unlock().
Thus main thread is stuck forever on:
  // Block until the suspender thread competes the tested threads suspension    
  agent_lock(jni);

And never checks and resumes the threads. So I removed that lock instead just 
sleep and check until all thread have the expected suspended state.

----

This version already contains updates after pre-review comments from 
@dcubed-ojdk, @pchilano, @coleenp.
(Pre-review comments here:
https://github.com/openjdk/jdk/pull/2625)

---- 
Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
combinations like running with -XX:ZCollectionInterval=0.01 -
XX:ZFragmentationLimit=0.
Running above some of above concurrently (load ~240), slow debug,
etc...

-------------

Commit messages:
 - 8257831: Suspend with handshake (review baseline)

Changes: https://git.openjdk.java.net/jdk/pull/3191/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3191&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8257831
  Stats: 1315 lines in 38 files changed: 271 ins; 855 del; 189 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3191.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3191/head:pull/3191

PR: https://git.openjdk.java.net/jdk/pull/3191

RFR: 8257831: Suspend with handshakes

Reply via email to