WOW! Unbelievable that AI could do that, at least to me. If most of
that is, in fact, meaningful - and I have no way of knowing if it is or
isn't, way over my head - this is a significant addition to the ability
to debug complex code problems. I have my fingers crossed that this
will help Rony find his problem because I want to believe in this
approach. Thanks for sharing René!
Gil
On 8/13/2025 9:53 AM, René Jansen via Oorexx-devel wrote:
I asked my buddy AI for you:
Short version: almost everything here is *blocked, waiting on kernel
objects/events*. One thread (the one with |rexx.dll| in the stack) is
trying to *attach to ooRexx* via BSF4ooRexx while the JVM is already
involved, and it’s waiting for the *ooRexx kernel mutex*. Meanwhile
several JVM worker threads are also parked in waits. This pattern
screams *lock-order inversion / deadlock between Java ↔
ooRexx* (likely “call into Rexx while holding something, which calls
back into Java, which tries to attach back into Rexx and blocks on the
Rexx global lock”).
What the stacks say
*
Repeated tops of stack:
|ntdll!NtWaitForSingleObject → KernelBase!WaitForSingleObjectEx →
jvm.dll!...|
That’s a *parked/waiting thread* (monitor/condition/OS event); not
runnable.
*
The interesting one (Not Flagged, tid |> 23728|):
|win32u!NtUserMsgWaitForMultipleObjectsEx → user32!RealMsgWait… →
rexx.dll!waitHandle → SysMutex::request →
ActivityManager::lockKernel → Activity::waitForKernel →
ActivityManager::addWaitingActivity → Activity::requestAccess →
Activity::nestAttach → InterpreterInstance::attachThread →
AttachThread → BSF4ooRexx850.dll …|
This shows a *BSF/ooRexx attach* trying to acquire the *Rexx
kernel lock* and *waiting* (message-wait variant, so it can pump
messages).
*
Many other JVM threads show the same wait pattern at different
internal pcs (|jvm.dll!0x7117e75a|, |…e82f|, etc.). That’s
consistent with *Java threads parked on
monitors/conditions* (e.g., GC, JIT, RMI, pool workers) while some
other thread is expected to make progress—but isn’t.
Likely scenario
1.
A thread entered *ooRexx* and still *holds the Rexx kernel
mutex* (ooRexx is single-kernel-locked).
2.
During that work, it *called into Java* (BSF).
3.
Another thread (or a callback on the same thread through message
pumping) is now trying to *AttachThread / nestAttach* back into
ooRexx → it blocks on |SysMutex::request()| because the kernel
lock is still held.
4.
Because of that, Java side work that would release the original
path can’t continue → *deadlock* (or at least a hard stall).
How to confirm quickly
*
*Grab a full dump* and in WinDbg run:
o
|!locks| (who owns the critical section / mutex in |rexx.dll|).
o
|~* kb| to see if any thread is inside |rexx.dll|
/not/ waiting (the probable owner).
*
From the Java side, run |jcmd <pid> Thread.print| (or |jstack|)
and look for:
o
Threads in |parking to wait for <…>| with *owning thread* info.
o
Any thread doing a *native call into BSF/ooRexx*.
*
In Process Explorer’s Threads tab, the *Waiting reason* for the
Rexx-attaching thread will be a wait on an event/mutex handle; the
*owner* (if resolvable) points to the holder.
Practical fixes / mitigations
*
*Never call back into Rexx while holding the Rexx kernel lock.* In
native ooRexx extensions this usually means:
o
Minimize the critical section; *release the kernel* before
making Java/BSF calls that can reenter.
o
If you must call out, *queue* work (post to another thread)
instead of re-entering synchronously.
*
For *BSF4ooRexx*:
o
Ensure every |AttachThread| is matched with |DetachThread| in
a |try/finally|.
o
Avoid *nested attaches* (|nestAttach| shows on your stack). If
you’re already attached, reuse the context; don’t attach again.
o
If Java code calls into Rexx, and Rexx code calls back into
Java, enforce a single direction at a time (e.g., schedule the
callback asynchronously).
*
Add diagnostics:
o
On the Rexx side, enable verbose/trace around *kernel
lock* operations if available; log when you enter/leave Rexx
from Java and vice versa.
o
On the Java side, log before/after *BSF calls* and include
current thread name + whether already attached.
If it’s not a full deadlock (just a stall)
It can still be *head-of-line blocking*: one long-running Rexx
activity holds the kernel, and many threads pile up on
|requestAccess()|. The cure is the same—*shorten the locked region* or
make the long task cooperative (yield/release).
If you paste the owner of the Rexx mutex from |!locks| (or a
|jstack| snippet showing the thread doing the call into Rexx while
others block), I can point at the exact offender and the safest place
to release the lock.
best regards,
René.
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
--
Gil Barmwater
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel