On Thu, 8 Apr 2021 07:17:48 GMT, Robbin Ehn <r...@openjdk.org> wrote:
>> A suspend request is done by handshaking thread target thread(s). When >> executing the handshake operation we know the target mutator thread is in a >> dormant state (as in safepoint safe state). We have a guarantee that it will >> check it's poll before leaving the dormant state. To stop the thread from >> leaving the the dormant state we install a second asynchronous handshake to >> be executed by the targeted thread. The asynchronous handshake will wait on >> a monitor while the thread is suspended. The target thread cannot not leave >> the dormant state without a resume request. >> >> Per thread suspend requests are naturally serialized by the per thread >> HandshakeState lock (we can only execute one handshake at a time per thread). >> Instead of having a separate lock we use this to our advantage and use >> HandshakeState lock for serializing access to the suspend flag and for >> wait/notify. >> >> Suspend: >> Requesting thread -> synchronous handshake -> target thread >> Inside synchronus handshake (HandshakeState lock is locked while >> executing any handshake): >> - Set suspended flag >> - Install asynchronous handshake >> >> Target thread -> tries to leave dormant state -> Executes handshakes >> Target only executes asynchronous handshake: >> - While suspended >> - Go to blocked >> - Wait on HandshakeState lock >> >> Resume: >> Resuming thread: >> - Lock HandshakeState lock >> - Clear suspended flag >> - Notify HandshakeState lock >> - Unlock HandshakeState lock >> >> The "suspend requested" flag is an optimization, without it a dormant thread >> could be suspended and resumed many times and each would add a new >> asynchronous handshake. Suspend requested flag means there is already an >> asynchronous suspend handshake in queue which can be re-used, only the >> suspend flag needs to be set. >> >> ---- >> Some code can be simplified or done in a smarter way but I refrained from >> doing such changes instead tried to keep existing code as is as far as >> possible. This concerns especially raw monitors. >> >> ---- >> Regarding the changed test, the documentation says: >> "If the calling thread is specified in the request_list array, this function >> will not return until some other thread resumes it." >> >> But the code: >> LOG("suspendTestedThreads: before JVMTI SuspendThreadList"); >> err = jvmti->SuspendThreadList(threads_count, threads, results); >> ... >> // Allow the Main thread to inspect the result of tested threads >> suspension >> agent_unlock(jni); >> >> The thread will never return from SuspendThreadList until resumed, so it >> cannot unlock with agent_unlock(). >> Thus main thread is stuck forever on: >> // Block until the suspender thread competes the tested threads suspension >> agent_lock(jni); >> >> And never checks and resumes the threads. So I removed that lock instead >> just sleep and check until all thread have the expected suspended state. >> >> ---- >> >> This version already contains updates after pre-review comments from >> @dcubed-ojdk, @pchilano, @coleenp. >> (Pre-review comments here: >> https://github.com/openjdk/jdk/pull/2625) >> >> ---- >> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and >> combinations like running with -XX:ZCollectionInterval=0.01 - >> XX:ZFragmentationLimit=0. >> Running above some of above concurrently (load ~240), slow debug, >> etc... > > Robbin Ehn has updated the pull request with a new target base due to a merge > or a rebase. The pull request now contains six commits: > > - White space fixes > - Merge branch 'master' into SuspendInHandshake > - Review fixes > - Merge branch 'master' into SuspendInHandshake > - Merge branch 'master' into SuspendInHandshake > - 8257831: Suspend with handshake (review baseline) I'm still really happy with this fix! Thumbs up! Reminder: don't forget to decide what to do about the obsolete options. @dholmes-ora can provide guidance on how to handle this type of removal. src/hotspot/share/runtime/handshake.cpp line 632: > 630: } > 631: > 632: void HandshakeState::self_suspened() { Typo: s/self_suspened/self_suspended/ src/hotspot/share/runtime/handshake.cpp line 654: > 652: void do_thread(Thread* thr) { > 653: JavaThread* target = thr->as_Java_thread(); > 654: target->handshake_state()->self_suspened(); Typo: s/self_suspened/self_suspended/ src/hotspot/share/runtime/handshake.hpp line 131: > 129: return true; > 130: } > 131: OrderAccess::loadload(); Needs a comment explaining what the memory sync is for. src/hotspot/share/runtime/handshake.hpp line 156: > 154: // This flag is true while there is async handshake (trap) > 155: // on queue. Since we do only need one, we can reuse it if > 156: // thread thread gets suspended again (after a resume) typo: s/thread thread/thread/ src/hotspot/share/runtime/objectMonitor.cpp line 410: > 408: > 409: current->frame_anchor()->make_walkable(current); > 410: OrderAccess::storestore(); Needs a comment explaining what the memory sync is for. src/hotspot/share/runtime/objectMonitor.cpp line 970: > 968: > 969: current->frame_anchor()->make_walkable(current); > 970: OrderAccess::storestore(); Needs a comment explaining what the memory sync is for. src/hotspot/share/runtime/objectMonitor.cpp line 977: > 975: if (_succ == current) { > 976: _succ = NULL; > 977: OrderAccess::fence(); Please add a comment to this line: `OrderAccess::fence(); // always do a full fence when successor is cleared` src/hotspot/share/runtime/objectMonitor.cpp line 1540: > 1538: { > 1539: current->frame_anchor()->make_walkable(current); > 1540: OrderAccess::storestore(); Needs a comment explaining what the memory sync is for. src/hotspot/share/runtime/objectMonitor.cpp line 1555: > 1553: if (_succ == current) { > 1554: _succ = NULL; > 1555: OrderAccess::fence(); Please add a comment to this line: OrderAccess::fence(); // always do a full fence when successor is cleared src/hotspot/share/runtime/thread.cpp line 455: > 453: ParkEvent::Release(_ParkEvent); > 454: // Can be racingly loaded in signal handler via has_terminated() > 455: Atomic::store(&_ParkEvent, (ParkEvent*)NULL); Nice solution for that odd signal handler issue. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3191