Re: [Oorexx-devel] Question ad a hang situation

Gilbert Barmwater via Oorexx-devel Wed, 13 Aug 2025 07:39:31 -0700

WOW! Unbelievable that AI could do that, at least to me. If most ofthat is, in fact, meaningful - and I have no way of knowing if it is orisn't, way over my head - this is a significant addition to the abilityto debug complex code problems. I have my fingers crossed that thiswill help Rony find his problem because I want to believe in thisapproach. Thanks for sharing René!

Gil


On 8/13/2025 9:53 AM, René Jansen via Oorexx-devel wrote:

I asked my buddy AI for you:

Short version: almost everything here is *blocked, waiting on kernelobjects/events*. One thread (the one with |rexx.dll| in the stack) istrying to *attach to ooRexx* via BSF4ooRexx while the JVM is alreadyinvolved, and it’s waiting for the *ooRexx kernel mutex*. Meanwhileseveral JVM worker threads are also parked in waits. This patternscreams *lock-order inversion / deadlock between Java ↔ooRexx* (likely “call into Rexx while holding something, which callsback into Java, which tries to attach back into Rexx and blocks on theRexx global lock”).



      What the stacks say

 *

    Repeated tops of stack:
    |ntdll!NtWaitForSingleObject → KernelBase!WaitForSingleObjectEx →
    jvm.dll!...|
    That’s a *parked/waiting thread* (monitor/condition/OS event); not
    runnable.

 *

    The interesting one (Not Flagged, tid |> 23728|):
    |win32u!NtUserMsgWaitForMultipleObjectsEx → user32!RealMsgWait… →
    rexx.dll!waitHandle → SysMutex::request →
    ActivityManager::lockKernel → Activity::waitForKernel →
    ActivityManager::addWaitingActivity → Activity::requestAccess →
    Activity::nestAttach → InterpreterInstance::attachThread →
    AttachThread → BSF4ooRexx850.dll …|
    This shows a *BSF/ooRexx attach* trying to acquire the *Rexx
    kernel lock* and *waiting* (message-wait variant, so it can pump
    messages).

 *

    Many other JVM threads show the same wait pattern at different
    internal pcs (|jvm.dll!0x7117e75a|, |…e82f|, etc.). That’s
    consistent with *Java threads parked on
    monitors/conditions* (e.g., GC, JIT, RMI, pool workers) while some
    other thread is expected to make progress—but isn’t.


      Likely scenario

1.

    A thread entered *ooRexx* and still *holds the Rexx kernel
    mutex* (ooRexx is single-kernel-locked).

2.

    During that work, it *called into Java* (BSF).

3.

    Another thread (or a callback on the same thread through message
    pumping) is now trying to *AttachThread / nestAttach* back into
    ooRexx → it blocks on |SysMutex::request()| because the kernel
    lock is still held.

4.

    Because of that, Java side work that would release the original
    path can’t continue → *deadlock* (or at least a hard stall).


      How to confirm quickly

 *

    *Grab a full dump* and in WinDbg run:

     o

        |!locks| (who owns the critical section / mutex in |rexx.dll|).

     o

        |~* kb| to see if any thread is inside |rexx.dll|
        /not/ waiting (the probable owner).

 *

    From the Java side, run |jcmd <pid> Thread.print| (or |jstack|)
    and look for:

     o

        Threads in |parking to wait for <…>| with *owning thread* info.

     o

        Any thread doing a *native call into BSF/ooRexx*.

 *

    In Process Explorer’s Threads tab, the *Waiting reason* for the
    Rexx-attaching thread will be a wait on an event/mutex handle; the
    *owner* (if resolvable) points to the holder.


      Practical fixes / mitigations

 *

    *Never call back into Rexx while holding the Rexx kernel lock.* In
    native ooRexx extensions this usually means:

     o

        Minimize the critical section; *release the kernel* before
        making Java/BSF calls that can reenter.

     o

        If you must call out, *queue* work (post to another thread)
        instead of re-entering synchronously.

 *

    For *BSF4ooRexx*:

     o

        Ensure every |AttachThread| is matched with |DetachThread| in
        a |try/finally|.

     o

        Avoid *nested attaches* (|nestAttach| shows on your stack). If
        you’re already attached, reuse the context; don’t attach again.

     o

        If Java code calls into Rexx, and Rexx code calls back into
        Java, enforce a single direction at a time (e.g., schedule the
        callback asynchronously).

 *

    Add diagnostics:

     o

        On the Rexx side, enable verbose/trace around *kernel
        lock* operations if available; log when you enter/leave Rexx
        from Java and vice versa.

     o

        On the Java side, log before/after *BSF calls* and include
        current thread name + whether already attached.


      If it’s not a full deadlock (just a stall)

It can still be *head-of-line blocking*: one long-running Rexxactivity holds the kernel, and many threads pile up on|requestAccess()|. The cure is the same—*shorten the locked region* ormake the long task cooperative (yield/release).

If you paste the owner of the Rexx mutex from |!locks| (or a|jstack| snippet showing the thread doing the call into Rexx whileothers block), I can point at the exact offender and the safest placeto release the lock.


best regards,

René.


_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel


--
Gil Barmwater

_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Re: [Oorexx-devel] Question ad a hang situation

Reply via email to