On Wed, 12 Nov 2025 13:29:07 GMT, Anton Artemov <[email protected]> wrote:
>> Hi, please consider the following changes: >> >> If suspension is allowed when a thread is re-entering an object monitor >> (OM), then a deadlock is possible. There are two places where it can happen: >> >> 1) The waiting thread is made to be a successor and is unparked. Upon a >> suspension request, the thread will suspend itself whilst clearing the >> successor. The OM will be left unlocked (not grabbed by any thread), while >> the other threads are parked until a thread grabs the OM and the exits it. >> The suspended thread is on the entry-list and can be selected as a successor >> again. None of other threads can be woken up to grab the OM until the >> suspended thread has been resumed and successfully releases the OM. >> >> 2) The race between suspension and retry: the thread could reacquire the OM >> and complete the wait() code in full, but then on return to Java it will be >> suspended while holding the OM. >> >> The issues are addressed by not allowing suspension in case 1, and by >> handling the suspension request at a later stage, after the thread has >> grabbed the OM in `reenter_internal()` in case 2. In case of a suspension >> request, the thread exits the OM and enters it again once resumed. >> >> The JVMTI `waited` event posting (2nd one) is postponed until the suspended >> thread is resumed and has entered the OM again. The `enter` to the OM (in >> case `ExitOnSuspend` did exit) is done without posting any events. >> >> Tests are added for both scenarios. >> >> Tested in tiers 1 - 7. > > Anton Artemov has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 20 commits: > > - Merge remote-tracking branch 'origin/master' into > JDK-8366659-OM-wait-suspend-deadlock > - 8366659: Fixed lines in tests. > - Merge remote-tracking branch 'origin/master' into > JDK-8366659-OM-wait-suspend-deadlock > - 8366659: Added a comment to a boolean arg for enter() > - Merge remote-tracking branch 'origin/master' into > JDK-8366659-OM-wait-suspend-deadlock > - Merge remote-tracking branch 'origin/master' into > JDK-8366659-OM-wait-suspend-deadlock > - 8366659: Fixed new lines. > - Merge remote-tracking branch 'origin/master' into > JDK-8366659-OM-wait-suspend-deadlock > - 8366659: Removed incorrect assert, > - 8366659: Fixed merge conflict > - ... and 10 more: https://git.openjdk.org/jdk/compare/400a83da...702880c6 I've done another crawl thru review. Thanks for making changes after the first round of comments. In this crawl-thru, my deep focus was on the modified test. test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 297: > 295: } > 296: > 297: // Notify the resumer while holding the threadLock Nit: please add a period at the end of this sentence. test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 372: > 370: // - a threadLock enter in the resumer thread > 371: // - resumption of the waiter thread > 372: // - a threadLock enter in the freshly resumed waiter > thread This list of step tests is identical to the list on L490 -> L493 and the original llist on L256 -> L259. This step comment: `370: // - a threadLock enter in the resumer thread` should be updated to something like: // - a blocked threadLock enter in the resumer thread while the // threadLock is held by the main thread. This change of threadLock scope also requires this update from: 605: // - tries to grab the threadLock (should not block!) to: 605: // - tries to grab the threadLock (should not block with doWork1!) test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 388: > 386: try { > 387: Thread.sleep(1000); > 388: } catch(Exception e) {} Why is this 1 second delay needed? test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 428: > 426: // launch the waiter thread > 427: synchronized (barrierLaunch) { > 428: waiter = new > SuspendWithObjectMonitorWaitWorker("waiter", 1); This change to `wait` for `1` instead of `0` requires this comment to be updated from: // TS_READY_TO_NOTIFY is set after the main thread has // entered threadLock so a spurious wakeup can't get the // waiter thread out of this threadLock.wait(0) call: to: // TS_READY_TO_NOTIFY is set after the main thread has // entered threadLock so a spurious wakeup can't get the // waiter thread out of this threadLock.wait(0) call in // doWork1 or doWork2. doWork3 passes a one so that the // wait() can terminate early and block on reentry. I'm having trouble seeing why this third test case is necessary. We do a short `wait(1)` in this test case instead of a `wait(0)` so we terminate the `wait(1)` with a timeout instead of a `notify()` from the main thread. In all worker test cases, the main thread grabs the threadsLock when the "waiter" thread calls `wait()`, the main thread does a `notify()`, the main thread waits until the worker thread contends on threadsLock and finally the main thread suspends the worker thread. The only thing that I see that the `wait(1)` brings to the party is that the worker3 thread might get to re-entry block on threadsLock via a timeout instead of a notify. What am I missing here? test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 454: > 452: try { > 453: Thread.sleep(1000); > 454: } catch(Exception e) {} Why is this 1 second delay needed? test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 493: > 491: // - a threadLock enter in the resumer thread > 492: // - resumption of the waiter thread > 493: // - a threadLock enter in the freshly resumed waiter > thread This list of step tests is identical to the list on L369 -> L372 and the original llist on L256 -> L259. This step comment: 491: // - a threadLock enter in the resumer thread should be updated to something like: // - a blocked threadLock enter in the resumer thread while the // threadLock is held by the main thread. test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 509: > 507: try { > 508: Thread.sleep(1000); > 509: } catch(Exception e) {} Why is this 1 second delay needed? ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27040#pullrequestreview-3454633483 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519187101 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519214534 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519476007 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519522596 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519194484 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519205810 PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519483650
