WOW! Unbelievable that AI could do that, at least to me.  If most of that is, in fact, meaningful - and I have no way of knowing if it is or isn't, way over my head - this is a significant addition to the ability to debug complex code problems.  I have my fingers crossed that this will help Rony find his problem because I want to believe in this approach.  Thanks for sharing René!

Gil

On 8/13/2025 9:53 AM, René Jansen via Oorexx-devel wrote:
I asked my buddy AI for you:

Short version: almost everything here is *blocked, waiting on kernel objects/events*. One thread (the one with |rexx.dll| in the stack) is trying to *attach to ooRexx* via BSF4ooRexx while the JVM is already involved, and it’s waiting for the *ooRexx kernel mutex*. Meanwhile several JVM worker threads are also parked in waits. This pattern screams *lock-order inversion / deadlock between Java ↔ ooRexx* (likely “call into Rexx while holding something, which calls back into Java, which tries to attach back into Rexx and blocks on the Rexx global lock”).


      What the stacks say

 *

    Repeated tops of stack:
    |ntdll!NtWaitForSingleObject → KernelBase!WaitForSingleObjectEx →
    jvm.dll!...|
    That’s a *parked/waiting thread* (monitor/condition/OS event); not
    runnable.

 *

    The interesting one (Not Flagged, tid |> 23728|):
    |win32u!NtUserMsgWaitForMultipleObjectsEx → user32!RealMsgWait… →
    rexx.dll!waitHandle → SysMutex::request →
    ActivityManager::lockKernel → Activity::waitForKernel →
    ActivityManager::addWaitingActivity → Activity::requestAccess →
    Activity::nestAttach → InterpreterInstance::attachThread →
    AttachThread → BSF4ooRexx850.dll …|
    This shows a *BSF/ooRexx attach* trying to acquire the *Rexx
    kernel lock* and *waiting* (message-wait variant, so it can pump
    messages).

 *

    Many other JVM threads show the same wait pattern at different
    internal pcs (|jvm.dll!0x7117e75a|, |…e82f|, etc.). That’s
    consistent with *Java threads parked on
    monitors/conditions* (e.g., GC, JIT, RMI, pool workers) while some
    other thread is expected to make progress—but isn’t.


      Likely scenario

1.

    A thread entered *ooRexx* and still *holds the Rexx kernel
    mutex* (ooRexx is single-kernel-locked).

2.

    During that work, it *called into Java* (BSF).

3.

    Another thread (or a callback on the same thread through message
    pumping) is now trying to *AttachThread / nestAttach* back into
    ooRexx → it blocks on |SysMutex::request()| because the kernel
    lock is still held.

4.

    Because of that, Java side work that would release the original
    path can’t continue → *deadlock* (or at least a hard stall).


      How to confirm quickly

 *

    *Grab a full dump* and in WinDbg run:

     o

        |!locks| (who owns the critical section / mutex in |rexx.dll|).

     o

        |~* kb| to see if any thread is inside |rexx.dll|
        /not/ waiting (the probable owner).

 *

    From the Java side, run |jcmd <pid> Thread.print| (or |jstack|)
    and look for:

     o

        Threads in |parking to wait for <…>| with *owning thread* info.

     o

        Any thread doing a *native call into BSF/ooRexx*.

 *

    In Process Explorer’s Threads tab, the *Waiting reason* for the
    Rexx-attaching thread will be a wait on an event/mutex handle; the
    *owner* (if resolvable) points to the holder.


      Practical fixes / mitigations

 *

    *Never call back into Rexx while holding the Rexx kernel lock.* In
    native ooRexx extensions this usually means:

     o

        Minimize the critical section; *release the kernel* before
        making Java/BSF calls that can reenter.

     o

        If you must call out, *queue* work (post to another thread)
        instead of re-entering synchronously.

 *

    For *BSF4ooRexx*:

     o

        Ensure every |AttachThread| is matched with |DetachThread| in
        a |try/finally|.

     o

        Avoid *nested attaches* (|nestAttach| shows on your stack). If
        you’re already attached, reuse the context; don’t attach again.

     o

        If Java code calls into Rexx, and Rexx code calls back into
        Java, enforce a single direction at a time (e.g., schedule the
        callback asynchronously).

 *

    Add diagnostics:

     o

        On the Rexx side, enable verbose/trace around *kernel
        lock* operations if available; log when you enter/leave Rexx
        from Java and vice versa.

     o

        On the Java side, log before/after *BSF calls* and include
        current thread name + whether already attached.


      If it’s not a full deadlock (just a stall)

It can still be *head-of-line blocking*: one long-running Rexx activity holds the kernel, and many threads pile up on |requestAccess()|. The cure is the same—*shorten the locked region* or make the long task cooperative (yield/release).

If you paste the owner of the Rexx mutex from |!locks| (or a |jstack| snippet showing the thread doing the call into Rexx while others block), I can point at the exact offender and the safest place to release the lock.

best regards,

René.


_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

--
Gil Barmwater
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to