When `ThreadSnapshotFactory::get_thread_snapshot()` captures a snapshot of a virtual thread, it uses `JvmtiVTMSTransitionDisabler` class to disable mount/unmount transitions. However, this only works if a JVMTI agent has attached to the VM, otherwise virtual threads don’t honor the disable request. Since this snapshot mechanism is used by `jcmd Thread.dump_to_file` and `HotSpotDiagnosticMXBean` which don’t require a JVMTI agent to be present, getting the snapshot of a virtual thread in such cases can lead to crashes.
This patch moves the transition-disabling mechanism out of JVMTI and into a new class, `MountUnmountDisabler`. The code has been updated so that transitions can be disabled independently of JVMTI, making JVMTI just one user of the API rather than the owner of the mechanism. Here is a summary of the key changes: - Currently when a virtual thread starts a mount/unmount transition we only need to check if `_VTMS_notify_jvmti_events` is set to decide if we need to go to the slow path. With these changes, JVMTI is now only one user of the API, so we instead check the actual transition disabling counters, i.e the per-vthread counter and the global counter. Since these can be set at any time (unlike `_VTMS_notify_jvmti_events` which is only set at startup or during a safepoint in case of late binding agents), we follow the classic Dekker pattern for the required synchronization. That is, the virtual thread sets the “in transition” bits for the carrier and vthread *before* reading the “transition disabled” counters. The thread requesting to disable transitions increments the “transition disabled” counter *before* reading the “in transition” bits. An alternative that avoids the extra fence would be to place extra overhead on the thread requesting to disable transitions (e.g. by using a safepoint, handshake-all, or UseSystemMemoryBarrier). Performance analysis show no difference with current mainline so I believe this approach is simpler. - Ending the transition doesn’t need to check if transitions are disabled (equivalent to not need to poll when transitioning from unsafe to safe safepoint state). But we still require to go through the slow path if there is a JVMTI agent present, since we need to check for event posting and JVMTI state rebinding. As such, the end transition follows the same pattern that we have today of only needing to check `_VTMS_notify_jvmti_events`. - The code was structured in terms of mount and unmount cases, and a variable was used to differentiate between start or end of the transition. With the changes to make the mechanism independent of JVMTI it becomes simpler to invert this and structure the code in terms of start transition and end transition, and use a variable to differentiate between mount and unmount cases. - All JVMTI code required during start/end transitions has been encapsulated in classes `JVMTIStartTransition` and `JVMTIEndTransition`. I kept the ordering of event posting as it is today. - Global variables `_sync_protocol_enabled_count` and `_sync_protocol_enabled_permanently` were removed. Variable `_VTMS_transition_disable_for_all_count` was renamed to `_global_start_transition_disable_count`, `_SR_mode` to `_exclusive_operation_ongoing` and `_VTMS_notify_jvmti_events` to `_notify_jvmti_events`. New global variable `_active_disablers` replaces the functionality of `_VTMS_transition_disable_for_one_count`. - Now, when the first agent attaches we not only set `_notify_jvmti_events` but we also increase global counter `_global_start_transition_disable_count`. This has the effect of always forcing the slow path when starting and ending a transition as we do today when `_VTMS_notify_jvmti_events` is set. A new `Handshake::execute` variant to handshake a virtual thread is introduced with this patch, which makes use of the new `MountUnmountDisabler` class. Method `ThreadSnapshotFactory::get_thread_snapshot` has been simplified to use this handshake variant to capture the snapshot of a virtual thread. The changes include new test `DumpThreadsWhenParking.java` from @AlanBateman which reliably reproduces the issue. I also verified the changes in Mach5 tiers1-7. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/28361/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28361&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364343 Stats: 1848 lines in 40 files changed: 844 ins; 824 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/28361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28361/head:pull/28361 PR: https://git.openjdk.org/jdk/pull/28361
