Re: RFR: 8257831: Suspend with handshakes [v2]

Richard Reingruber Tue, 30 Mar 2021 14:17:13 -0700

On Fri, 26 Mar 2021 13:21:43 GMT, Robbin Ehn <r...@openjdk.org> wrote:


>> A suspend request is done by handshaking thread target thread(s). When 
>> executing the handshake operation we know the target mutator thread is in a 
>> dormant state (as in safepoint safe state). We have a guarantee that it will 
>> check it's poll before leaving the dormant state. To stop the thread from 
>> leaving the the dormant state we install a second asynchronous handshake to 
>> be executed by the targeted thread. The asynchronous handshake will wait on 
>> a monitor while the thread is suspended. The target thread cannot not leave 
>> the dormant state without a resume request.
>> 
>> Per thread suspend requests are naturally serialized by the per thread 
>> HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use 
>> HandshakeState lock for serializing access to the suspend flag and for 
>> wait/notify. 
>> 
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>>      - Set suspended flag
>>      - Install asynchronous handshake
>> 
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>>      - While suspended
>>      - Go to blocked
>>      - Wait on HandshakeState lock
>> 
>> Resume:
>> Resuming thread:
>>      - Lock HandshakeState lock
>>      - Clear suspended flag
>>      - Notify HandshakeState lock
>>      - Unlock HandshakeState lock
>> 
>> The "suspend requested" flag is an optimization, without it a dormant thread 
>> could be suspended and resumed many times and each would add a new 
>> asynchronous handshake. Suspend requested flag means there is already an 
>> asynchronous suspend handshake in queue which can be re-used, only the 
>> suspend flag needs to be set.
>> 
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from 
>> doing such changes instead tried to keep existing code as is as far as 
>> possible. This concerns especially raw monitors.
>> 
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function 
>> will not return until some other thread resumes it."
>> 
>> But the code:
>>   LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>>   err = jvmti->SuspendThreadList(threads_count, threads, results);
>>   ...
>>   // Allow the Main thread to inspect the result of tested threads 
>> suspension        
>>   agent_unlock(jni);
>>   
>> The thread will never return from SuspendThreadList until resumed, so it 
>> cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>>   // Block until the suspender thread competes the tested threads suspension 
>>   agent_lock(jni);
>> 
>> And never checks and resumes the threads. So I removed that lock instead 
>> just sleep and check until all thread have the expected suspended state.
>> 
>> ----
>> 
>> This version already contains updates after pre-review comments from 
>> @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>> 
>> ---- 
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains two commits:
> 
>  - Merge branch 'master' into SuspendInHandshake
>  - 8257831: Suspend with handshake (review baseline)

Hi Robbin,

this is a great simplification. Excellent work! :)

Thanks, Richard.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 318:

> 316:   if (self->is_Java_thread()) {
> 317:     jt = self->as_Java_thread();
> 318:     while (true) {

>From the PR description:

> Some code can be simplified or done in a smarter way but I refrained from 
> doing such changes instead tried to keep existing code as is as far as 
> possible. This concerns especially raw monitors.

I read this section but then forgot about it. So you can skip the following 
comment.

Possible follow-up: It could be worth the attempt to make use of the generic 
state transition classes provided by interface.hpp. Directly I don't see why 
this wouldn't be feasible and I doubt the [old comment in 
JvmtiEnv::RawMonitorEnter()](https://github.com/reinrich/jdk/blob/aefc1560b51f0ce96d8f5ce396ba0d2fe08fd650/src/hotspot/share/prims/jvmtiEnv.cpp#L3208-L3214).
This would make the custom suspend logic here superfluous.
If ThreadBlockInVM was changed to take a generic callback for unlocking, then 
this could replace the custom logic in L360-L374.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 323:

> 321:       // Suspend request flag can only be set in handshakes.
> 322:       // By blocking handshakes, suspend request flag cannot change its 
> value.
> 323:       if (!jt->handshake_state()->is_suspend_requested()) {

Shouldn't `jt->handshake_state()->is_suspend_requested()` be replaced with 
`jt->is_suspended()`? See also comment on declaration of `_suspend_requested`.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 364:

> 362:     for (;;) {
> 363:       simple_enter(jt);
> 364:       if (!SafepointMechanism::should_process(jt)) {

It seems to be likely that the condition is false in the first loop iteration 
and the thread has to do another iteration even if not suspended. Wouldn't it 
be ok to break from the loop if `!jt->is_suspended()`?
I'm asking because in ObjectMonitor::enter() L414 there is similar code and the 
condition there is `SafepointMechanism::should_process(current) && 
current->suspend_request_pending()`

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 421:

> 419:     guarantee(jt->thread_state() == _thread_in_native, "invariant");
> 420:     for (;;) {
> 421:       if (!SafepointMechanism::should_process(jt)) {

Potential follow-up enhancement: there's a very similar for(;;) loop in 
JvmtiRawMonitor::raw_enter(). It looks as if it was possible to merge both 
loops into just one loop in simple_enter().

src/hotspot/share/runtime/handshake.cpp line 485:

> 483:       } else {
> 484:         // Asynchronous may block so they may not execute 
> ~PreserveExceptionMark before safepointing
> 485:         // in outer loop.

Sorry, I don't understand the comment.

src/hotspot/share/runtime/handshake.cpp line 486:

> 484:         // Asynchronous may block so they may not execute 
> ~PreserveExceptionMark before safepointing
> 485:         // in outer loop.
> 486:         op->do_handshake(_handshakee);

Maybe add PauseNoSafepointVerifier to document that the current thread can 
transition to a safe state?

src/hotspot/share/runtime/handshake.cpp line 492:

> 490:       }
> 491:     } else {
> 492:       return true;

`true` as return value means that the caller does _not_ need to check again for 
a safepoint, correct? Maybe this could be added as comment to the method 
declaration?

src/hotspot/share/runtime/handshake.cpp line 617:

> 615:     {
> 616:       MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 617:       if (!is_suspend_requested()) {

The current thread is _thread_in_vm. Can it be suspended then? For a suspend it 
has to be in a safe thread state which it cannot leave while suspended. So it 
must have reached _thread_in_vm not suspended and it cannot be suspended while 
in that state.

src/hotspot/share/runtime/handshake.cpp line 715:

> 713: bool HandshakeState::resume() {
> 714:   // Can't be suspended if there is no suspend request.
> 715:   if (!is_suspend_requested()) {

Shouldn't `is_suspend_requested()` be replaced with `is_suspended()`? See also 
comment on declaration of `_suspend_requested`.

src/hotspot/share/runtime/handshake.cpp line 720:

> 718:   MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 719:   // Can't be suspended if there is no suspend request.
> 720:   if (!is_suspend_requested()) {

Shouldn't `is_suspend_requested()` be replaced with `is_suspended()`? See also 
comment on declaration of `_suspend_requested`.

src/hotspot/share/runtime/handshake.hpp line 142:

> 140:  private:
> 141:   volatile bool _suspended;
> 142:   volatile bool _suspend_requested;

According to the PR description `_suspend_requested` is an optimization.

> The "suspend requested" flag is an optimization, without it a dormant thread
> could be suspended and resumed many times and each would add a new
> asynchronous handshake. Suspend requested flag means there is already an
> asynchronous suspend handshake in queue which can be re-used, only the suspend
> flag needs to be set.

I think there are a few places where _suspend_requested is queried by mistake 
instead of _suspended. Maybe it would help to prevent this if 
_suspend_requested was renamed to something that better describes its purpose, 
e.g. _has_async_suspend_handshake or similar.

src/hotspot/share/runtime/mutex.cpp line 244:

> 242:     wait_status = _lock.wait(timeout);
> 243:     in_flight_mutex = this;  // save for ~ThreadBlockInVM
> 244: 

Empty line can be removed.

src/hotspot/share/runtime/mutexLocker.hpp line 260:

> 258: 
> 259:   bool wait(int64_t timeout = 0,
> 260:             bool as_suspend_equivalent = 
> !Mutex::_as_suspend_equivalent_flag) {

The declaration of Mutex::_as_suspend_equivalent_flag should be removed. You 
might want to grep for 'equivalent' to find more that can be removed.

src/hotspot/share/runtime/objectMonitor.cpp line 415:

> 413:       current->set_thread_state_fence(_thread_blocked_trans);
> 414:       if (SafepointMechanism::should_process(current) &&
> 415:         current->suspend_request_pending()) {

Shouldn't `suspend_request_pending()` be replaced with `is_suspended()`? See 
also comment on declaration of `_suspend_requested`.
And isn't checking `SafepointMechanism::should_process(current)` redundant?

src/hotspot/share/runtime/objectMonitor.cpp line 973:

> 971:       if (SafepointMechanism::should_process(current)) {
> 972:         if (_succ == current) {
> 973:             _succ = NULL;

IIUC then this is now not only executed if a thread is suspended but also when 
there's a safepoint / handshake. I tried to understand the effect of this but 
failed with timeout ;)

src/hotspot/share/runtime/os.cpp line 874:

> 872: 
> 873: void os::start_thread(Thread* thread) {
> 874:   if (thread->is_Java_thread()) {

Then and else blocks seem to do the very same things.

src/hotspot/share/runtime/safepointMechanism.cpp line 101:

> 99:         !thread->handshake_state()->process_by_self()) {
> 100:       need_rechecking = true;
> 101:     }

What about this version
    if (thread->handshake_state()->should_process()) {
      need_rechecking = !thread->handshake_state()->process_by_self();
    }
or even 
    need_rechecking =
        thread->handshake_state()->should_process() && 
!thread->handshake_state()->process_by_self();
With the latter you could eliminate L82
      need_rechecking = false;
Also I'd find it more natural if `process_by_self()` could return true if 
rechecking is needed.

src/hotspot/share/runtime/sweeper.cpp line 276:

> 274: 
> 275:     ThreadBlockInVM tbivm(thread);
> 276:     thread->java_suspend_self();

In the baseline version of NMethodSweeper::handle_safepoint_request() there is 
a call `thread->java_suspend_self()`. This call will immediately return, won't 
it? Do you know what the purpose of this call was? The destructor of 
ThreadBlockInVM will block the current thread for the safepoint. So 
`java_suspend_self()` seems redundant.

-------------

Changes requested by rrich (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3191

Re: RFR: 8257831: Suspend with handshakes [v2]

Reply via email to