On Thu, 8 Apr 2021 07:17:48 GMT, Robbin Ehn <r...@openjdk.org> wrote:

>> A suspend request is done by handshaking thread target thread(s). When 
>> executing the handshake operation we know the target mutator thread is in a 
>> dormant state (as in safepoint safe state). We have a guarantee that it will 
>> check it's poll before leaving the dormant state. To stop the thread from 
>> leaving the the dormant state we install a second asynchronous handshake to 
>> be executed by the targeted thread. The asynchronous handshake will wait on 
>> a monitor while the thread is suspended. The target thread cannot not leave 
>> the dormant state without a resume request.
>> 
>> Per thread suspend requests are naturally serialized by the per thread 
>> HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use 
>> HandshakeState lock for serializing access to the suspend flag and for 
>> wait/notify. 
>> 
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>>      - Set suspended flag
>>      - Install asynchronous handshake
>> 
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>>      - While suspended
>>      - Go to blocked
>>      - Wait on HandshakeState lock
>> 
>> Resume:
>> Resuming thread:
>>      - Lock HandshakeState lock
>>      - Clear suspended flag
>>      - Notify HandshakeState lock
>>      - Unlock HandshakeState lock
>> 
>> The "suspend requested" flag is an optimization, without it a dormant thread 
>> could be suspended and resumed many times and each would add a new 
>> asynchronous handshake. Suspend requested flag means there is already an 
>> asynchronous suspend handshake in queue which can be re-used, only the 
>> suspend flag needs to be set.
>> 
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from 
>> doing such changes instead tried to keep existing code as is as far as 
>> possible. This concerns especially raw monitors.
>> 
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function 
>> will not return until some other thread resumes it."
>> 
>> But the code:
>>   LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>>   err = jvmti->SuspendThreadList(threads_count, threads, results);
>>   ...
>>   // Allow the Main thread to inspect the result of tested threads 
>> suspension        
>>   agent_unlock(jni);
>>   
>> The thread will never return from SuspendThreadList until resumed, so it 
>> cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>>   // Block until the suspender thread competes the tested threads suspension 
>>   agent_lock(jni);
>> 
>> And never checks and resumes the threads. So I removed that lock instead 
>> just sleep and check until all thread have the expected suspended state.
>> 
>> ----
>> 
>> This version already contains updates after pre-review comments from 
>> @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>> 
>> ---- 
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains six commits:
> 
>  - White space fixes
>  - Merge branch 'master' into SuspendInHandshake
>  - Review fixes
>  - Merge branch 'master' into SuspendInHandshake
>  - Merge branch 'master' into SuspendInHandshake
>  - 8257831: Suspend with handshake (review baseline)

I'm still really happy with this fix! Thumbs up!
Reminder: don't forget to decide what to do about the obsolete options.
@dholmes-ora can provide guidance on how to handle this type of removal.

src/hotspot/share/runtime/handshake.cpp line 632:

> 630: }
> 631: 
> 632: void HandshakeState::self_suspened() {

Typo: s/self_suspened/self_suspended/

src/hotspot/share/runtime/handshake.cpp line 654:

> 652:   void do_thread(Thread* thr) {
> 653:     JavaThread* target = thr->as_Java_thread();
> 654:     target->handshake_state()->self_suspened();

Typo: s/self_suspened/self_suspended/

src/hotspot/share/runtime/handshake.hpp line 131:

> 129:       return true;
> 130:     }
> 131:     OrderAccess::loadload();

Needs a comment explaining what the memory sync is for.

src/hotspot/share/runtime/handshake.hpp line 156:

> 154:   // This flag is true while there is async handshake (trap)
> 155:   // on queue. Since we do only need one, we can reuse it if
> 156:   // thread thread gets suspended again (after a resume)

typo: s/thread thread/thread/

src/hotspot/share/runtime/objectMonitor.cpp line 410:

> 408: 
> 409:     current->frame_anchor()->make_walkable(current);
> 410:     OrderAccess::storestore();

Needs a comment explaining what the memory sync is for.

src/hotspot/share/runtime/objectMonitor.cpp line 970:

> 968: 
> 969:       current->frame_anchor()->make_walkable(current);
> 970:       OrderAccess::storestore();

Needs a comment explaining what the memory sync is for.

src/hotspot/share/runtime/objectMonitor.cpp line 977:

> 975:         if (_succ == current) {
> 976:             _succ = NULL;
> 977:             OrderAccess::fence();

Please add a comment to this line:
`OrderAccess::fence();  // always do a full fence when successor is cleared`

src/hotspot/share/runtime/objectMonitor.cpp line 1540:

> 1538:     {
> 1539:       current->frame_anchor()->make_walkable(current);
> 1540:       OrderAccess::storestore();

Needs a comment explaining what the memory sync is for.

src/hotspot/share/runtime/objectMonitor.cpp line 1555:

> 1553:         if (_succ == current) {
> 1554:             _succ = NULL;
> 1555:             OrderAccess::fence();

Please add a comment to this line:
OrderAccess::fence(); // always do a full fence when successor is cleared

src/hotspot/share/runtime/thread.cpp line 455:

> 453:   ParkEvent::Release(_ParkEvent);
> 454:   // Can be racingly loaded in signal handler via has_terminated()
> 455:   Atomic::store(&_ParkEvent, (ParkEvent*)NULL);

Nice solution for that odd signal handler issue.

-------------

Marked as reviewed by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3191

Reply via email to