On Wed, 12 Nov 2025 13:29:07 GMT, Anton Artemov <[email protected]> wrote:

>> Hi, please consider the following changes:
>> 
>> If suspension is allowed when a thread is re-entering an object monitor 
>> (OM), then a deadlock is possible. There are two places where it can happen:
>> 
>> 1) The waiting thread is made to be a successor and is unparked. Upon a 
>> suspension request, the thread will suspend itself whilst clearing the 
>> successor. The OM will be left unlocked (not grabbed by any thread), while 
>> the other threads are parked until a thread grabs the OM and the exits it. 
>> The suspended thread is on the entry-list and can be selected as a successor 
>> again. None of other threads can be woken up to grab the OM until the 
>> suspended thread has been resumed and successfully releases the OM.
>> 
>> 2) The race between suspension and retry: the thread could reacquire the OM 
>> and complete the wait() code in full, but then on return to Java it will be 
>> suspended while holding the OM.
>> 
>> The issues are addressed by not allowing suspension in case 1, and by 
>> handling the suspension request at a later stage, after the thread has 
>> grabbed the OM in `reenter_internal()` in case 2. In case of a suspension 
>> request, the thread exits the OM and enters it again once resumed. 
>> 
>> The JVMTI `waited` event posting (2nd one) is postponed until the suspended 
>> thread is resumed and has entered the OM again.  The `enter` to the OM (in 
>> case `ExitOnSuspend` did exit) is done without posting any events.
>> 
>> Tests are added for both scenarios. 
>> 
>> Tested in tiers 1 - 7.
>
> Anton Artemov has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 20 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into 
> JDK-8366659-OM-wait-suspend-deadlock
>  - 8366659: Fixed lines in tests.
>  - Merge remote-tracking branch 'origin/master' into 
> JDK-8366659-OM-wait-suspend-deadlock
>  - 8366659: Added a comment to a boolean arg for enter()
>  - Merge remote-tracking branch 'origin/master' into 
> JDK-8366659-OM-wait-suspend-deadlock
>  - Merge remote-tracking branch 'origin/master' into 
> JDK-8366659-OM-wait-suspend-deadlock
>  - 8366659: Fixed new lines.
>  - Merge remote-tracking branch 'origin/master' into 
> JDK-8366659-OM-wait-suspend-deadlock
>  - 8366659: Removed incorrect assert,
>  - 8366659: Fixed merge conflict
>  - ... and 10 more: https://git.openjdk.org/jdk/compare/400a83da...702880c6

I've done another crawl thru review. Thanks for making changes after the first
round of comments. In this crawl-thru, my deep focus was on the modified test.

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 297:

> 295:     }
> 296: 
> 297:     // Notify the resumer while holding the threadLock

Nit: please add a period at the end of this sentence.

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 372:

> 370:                 // - a threadLock enter in the resumer thread
> 371:                 // - resumption of the waiter thread
> 372:                 // - a threadLock enter in the freshly resumed waiter 
> thread

This list of step tests is identical to the list on L490 -> L493 and the
original llist on L256 -> L259.

This step comment:
`370:               // - a threadLock enter in the resumer thread`
should be updated to something like:

             // - a blocked threadLock enter in the resumer thread while the
             //   threadLock is held by the main thread.


This change of threadLock scope also requires this update from:

605:        // - tries to grab the threadLock (should not block!)


to:

605:        // - tries to grab the threadLock (should not block with doWork1!)

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 388:

> 386:                 try {
> 387:                     Thread.sleep(1000);
> 388:                 } catch(Exception e) {}

Why is this 1 second delay needed?

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 428:

> 426:             // launch the waiter thread
> 427:             synchronized (barrierLaunch) {
> 428:                 waiter = new 
> SuspendWithObjectMonitorWaitWorker("waiter", 1);

This change to `wait` for `1` instead of `0` requires this comment to be 
updated from:

                // TS_READY_TO_NOTIFY is set after the main thread has
                // entered threadLock so a spurious wakeup can't get the
                // waiter thread out of this threadLock.wait(0) call:

to:

                // TS_READY_TO_NOTIFY is set after the main thread has
                // entered threadLock so a spurious wakeup can't get the
                // waiter thread out of this threadLock.wait(0) call in
                // doWork1 or doWork2. doWork3 passes a one so that the
                // wait() can terminate early and block on reentry.


I'm having trouble seeing why this third test case is necessary. We do a short 
`wait(1)`
in this test case instead of a `wait(0)` so we terminate the `wait(1)` with a 
timeout instead
of a `notify()` from the main thread. 

In all worker test cases, the main thread grabs the threadsLock when the 
"waiter" thread
calls `wait()`, the main thread does a `notify()`, the main thread waits until 
the worker
thread contends on threadsLock and finally the main thread suspends the worker 
thread.
The only thing that I see that the `wait(1)` brings to the party is that the 
worker3 thread
might get to re-entry block on threadsLock via a timeout instead of a notify.

What am I missing here?

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 454:

> 452:             try {
> 453:                 Thread.sleep(1000);
> 454:             } catch(Exception e) {}

Why is this 1 second delay needed?

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 493:

> 491:                 // - a threadLock enter in the resumer thread
> 492:                 // - resumption of the waiter thread
> 493:                 // - a threadLock enter in the freshly resumed waiter 
> thread

This list of step tests is identical to the list on L369 -> L372 and the
original llist on L256 -> L259.

This step comment:

491:               // - a threadLock enter in the resumer thread

should be updated to something like:

             // - a blocked threadLock enter in the resumer thread while the
             //   threadLock is held by the main thread.

test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java
 line 509:

> 507:                 try {
> 508:                     Thread.sleep(1000);
> 509:                 } catch(Exception e) {}

Why is this 1 second delay needed?

-------------

Marked as reviewed by dcubed (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27040#pullrequestreview-3454633483
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519187101
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519214534
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519476007
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519522596
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519194484
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519205810
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519483650

Reply via email to