On Fri, 9 Apr 2021 16:12:16 GMT, Daniel D. Daugherty <dcu...@openjdk.org> wrote:
>> src/hotspot/share/runtime/handshake.cpp line 630: >> >>> 628: // exiting. >>> 629: } >>> 630: } >> >> I need a little help learning the steps of this dance :) >> >> We reach here in _thread_in_vm. We cannot be suspended in this state. There >> might be another thread waiting to handshake us to suspend us but why can't >> we >> just ignore that and do the `_handshakee->set_exiting();` even without >> locking >> HandshakeState::_lock? >> >> Adding a handshake operation is lock free, so even if the check >> `SafepointMechanism::should_process(_handshakee)` in L619 returns false a new >> operation to process could have been added concurrently such that >> `SafepointMechanism::should_process(_handshakee)` would return true when we >> execute `_handshakee->set_exiting();` in L620. What am I missing? > > `HandshakeState::suspend()` is a synchronous handshake and adding the > handshake to the queue is lock free, but the execution of the synchronous > handshake itself requires a `HandshakeState::claim_handshake()` call which > does acquire the lock in question. We (the suspend requester) hold the lock > while the handshake is being processed so we either detect that > _handshakee->set_exiting() won the race (in the target thread) or we (the > suspend requester) win the race of setting the suspend flag so the target > thread can't exit yet. > > Hopefully that helps explain this dance. Hi Dan, thanks for picking up my question! > `HandshakeState::suspend()` is a synchronous handshake and adding the > handshake to the queue is lock free, but the execution of the synchronous > handshake itself requires a `HandshakeState::claim_handshake()` call which > does acquire the lock in question. My point would be that the attempt to execute the synchronous handshake for suspending a thread that is just about to call HandshakeState::thread_exit() cannot make progress (blocks) while the target thread is not safe (_thread_in_vm). A synchronous handshake requires the target thread to be in a safe state for the requester to execute the handshake operation. When executing HandshakeState::thread_exit() the suspendee is _thread_in_vm. And the requester will find it to be `_not_safe` when calling `possibly_can_process_handshake()` before calling `HandshakeState::claim_handshake()` or when calling `can_process_handshake()` afterwards. In both cases try_process() returns with failure _not_safe and the lock is not held. ++ 546 if (!possibly_can_process_handshake()) { 547 // JT is observed in an unsafe state, it must notice the handshake itself 548 return HandshakeState::_not_safe; 549 } 550 551 // Claim the mutex if there still an operation to be executed. 552 if (!claim_handshake()) { 553 return HandshakeState::_claim_failed; 554 } 555 556 // If we own the mutex at this point and while owning the mutex we 557 // can observe a safe state the thread cannot possibly continue without 558 // getting caught by the mutex. 559 if (!can_process_handshake()) { 560 _lock.unlock(); 561 return HandshakeState::_not_safe; 562 } So isn't being unsafe sufficient to sync with suspend requests? ------------- PR: https://git.openjdk.java.net/jdk/pull/3191