On Mon, 18 Oct 2021 20:02:21 GMT, Richard Reingruber <rr...@openjdk.org> wrote:

>> This change fixes deadlocks described in the JBS-bug by:
>> 
>> * Releasing `handlerLock` before waiting on `threadLock` in 
>> `blockOnDebuggerSuspend()`
>> 
>> * Notifying on `threadLock` in `threadControl_reset()`
>> 
>> Also the actions in handleAppResumeBreakpoint() are moved/deferred until
>> doPendingTasks() runs. This is necessary because an event handler must not
>> release handlerLock first and foremost because handlers are called while
>> iterating the handler chain for an event type which is protected by 
>> handlerLock
>> (see https://github.com/openjdk/jdk/pull/5805)
>> 
>> The first commit delays the cleanup actions after leaving the loop in
>> `debugLoop_run()`. It allows to reproduce the deadlock running the dispose003
>> test with the command
>> 
>> 
>> make run-test 
>> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003
>> 
>> 
>> The second commit adds a new test that reproduces the deadlock when calling
>> threadControl_resumeThread() while a thread is waiting in
>> blockOnDebuggerSuspend().
>> 
>> The third commit contains the fix described above. With it the deadlocks
>> cannot be reproduced anymore.
>> 
>> The forth commit removes the diagnostic code introduced with the first 
>> commit again.
>> 
>> The fix passed
>> 
>> test/hotspot/jtreg/serviceability/jdwp
>> test/jdk/com/sun/jdi
>> test/hotspot/jtreg/vmTestbase/nsk/jdwp
>> test/hotspot/jtreg/vmTestbase/nsk/jdi
>
> Richard Reingruber has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Call blockOnDebuggerSuspend() after setup of the resumer tracking.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 770:

> 768:      * handlerLock which has to be acquired before threadLock.
> 769:      */
> 770:     debugMonitorExit(threadLock);

I think we can avoid the exit here. threadLock was grabbed in 
`threadControl_onEventHandlerExit()`. It probably should be released before 
calling `doPendingTasks()`. My suggestion would be to first release it right 
after the `findThread()` call, and then in the `ei == EI_THREAD_END` block grab 
it again at the start of the block and release at the end of the block.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 771:

> 769:      */
> 770:     debugMonitorExit(threadLock);
> 771:     eventHandler_lock(); /* for proper lock order */

"for proper lock order during eventHandler_createInternalThreadOnly() calls"

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 772:

> 770:     debugMonitorExit(threadLock);
> 771:     eventHandler_lock(); /* for proper lock order */
> 772:     debugMonitorEnter(threadLock);

Somewhere we need to mention that `trackAppResume()` exits with threadLock 
still being held, although with my recommended changes below this would no 
longer be the case.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 807:

> 805:     }
> 806: 
> 807:     eventHandler_unlock();

This is still a bit odd looking because we grabbed this lock for lock ordering 
purposes, but are now releasing it out of order. But it's starting to sink in 
with me that the root of our problem here is that lock order dictates grabbing 
handlerLock first, and we need to eventually wait on threadLock but with 
handlerLock released, which suggest the lock ordering is reversed. There is 
probably some design bug here that results in that, but I'd hate to mess with 
the ordering of these two locks to try to fix it. Maybe a 3rd lock would help 
(wait on some new lock instead of threadLock). We could grab that lock first in 
`doPendingTasks()`, and not have to exit `trackAppResume()` with threadLock 
held, but again this might be more risk than it is worth.

So it we aren't going to change the lock ordering or introduce a 3rd lock, then 
my suggestion would be (after making the above suggested change to release 
threadLock before calling `threadControl_onEventHandlerExit()`) to move the 
locking and unlocking into `doPendingTasks()`. At the start of the 
`node->handlingAppResume` block, grab handlerLock and threadLock, and explain 
that handlerLock is being grabbed because trackAppResume() will (indirectly) 
try to grab it, and grabbing it before threadLock is necessary for lock 
ordering. Then grab threadLock and hold onto it until the end of the block. 
Between the `trackAppResume()` and `blockOnDebuggerSuspend()` calls, release 
handlerLock and explain that it will not be needed anymore, and has to be 
released before calling `blockOnDebuggerSuspend()`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5849

Reply via email to