Yes, impressive, indeed, thank you, René.

However there is one important piece of information that is missing: that application did work a couple of years ago, and sometimes works, mostly on Linux and macOS, if it does. Therefore I think that in principle everything is set out correctly, but that a situation arises that causes that hang. Having spent quite some time with that area of the interpreter I was hoping to get some  hints, ideas, theories what could be a possible reason for it. Granted, this is an optimistic request, but hey, if one does not try one would not get a "lucky punch" hint. If there are no ideas, then I need to systematically go through the code which may take a lot of time and effort.

---rony


On 13.08.2025 16:08, Gilbert Barmwater via Oorexx-devel wrote:

WOW! Unbelievable that AI could do that, at least to me.  If most of that is, in fact, meaningful - and I have no way of knowing if it is or isn't, way over my head - this is a significant addition to the ability to debug complex code problems.  I have my fingers crossed that this will help Rony find his problem because I want to believe in this approach. Thanks for sharing René!

Gil

On 8/13/2025 9:53 AM, René Jansen via Oorexx-devel wrote:
I asked my buddy AI for you:

Short version: almost everything here is *blocked, waiting on kernel objects/events*. One thread (the one with |rexx.dll| in the stack) is trying to *attach to ooRexx* via BSF4ooRexx while the JVM is already involved, and it’s waiting for the *ooRexx kernel mutex*. Meanwhile several JVM worker threads are also parked in waits. This pattern screams *lock-order inversion / deadlock between Java ↔ ooRexx* (likely “call into Rexx while holding something, which calls back into Java, which tries to attach back into Rexx and blocks on the Rexx global lock”).


      What the stacks say

 *

    Repeated tops of stack:
    |ntdll!NtWaitForSingleObject → KernelBase!WaitForSingleObjectEx → 
jvm.dll!...|
    That’s a *parked/waiting thread* (monitor/condition/OS event); not runnable.

 *

    The interesting one (Not Flagged, tid |> 23728|):
    |win32u!NtUserMsgWaitForMultipleObjectsEx → user32!RealMsgWait… → 
rexx.dll!waitHandle →
    SysMutex::request → ActivityManager::lockKernel → Activity::waitForKernel →
    ActivityManager::addWaitingActivity → Activity::requestAccess → 
Activity::nestAttach →
    InterpreterInstance::attachThread → AttachThread → BSF4ooRexx850.dll …|
    This shows a *BSF/ooRexx attach* trying to acquire the *Rexx kernel lock* 
and
    *waiting* (message-wait variant, so it can pump messages).

 *

    Many other JVM threads show the same wait pattern at different internal pcs
    (|jvm.dll!0x7117e75a|, |…e82f|, etc.). That’s consistent with *Java threads 
parked on
    monitors/conditions* (e.g., GC, JIT, RMI, pool workers) while some other 
thread is expected
    to make progress—but isn’t.


      Likely scenario

1.

    A thread entered *ooRexx* and still *holds the Rexx kernel mutex* (ooRexx is
    single-kernel-locked).

2.

    During that work, it *called into Java* (BSF).

3.

    Another thread (or a callback on the same thread through message pumping) 
is now trying to
    *AttachThread / nestAttach* back into ooRexx → it blocks on 
|SysMutex::request()| because the
    kernel lock is still held.

4.

    Because of that, Java side work that would release the original path can’t 
continue →
    *deadlock* (or at least a hard stall).


      How to confirm quickly

 *

    *Grab a full dump* and in WinDbg run:

     o

        |!locks| (who owns the critical section / mutex in |rexx.dll|).

     o

        |~* kb| to see if any thread is inside |rexx.dll| /not/ waiting (the 
probable owner).

 *

    From the Java side, run |jcmd <pid> Thread.print| (or |jstack|) and look 
for:

     o

        Threads in |parking to wait for <…>| with *owning thread* info.

     o

        Any thread doing a *native call into BSF/ooRexx*.

 *

    In Process Explorer’s Threads tab, the *Waiting reason* for the 
Rexx-attaching thread will be
    a wait on an event/mutex handle; the *owner* (if resolvable) points to the 
holder.


      Practical fixes / mitigations

 *

    *Never call back into Rexx while holding the Rexx kernel lock.* In native 
ooRexx extensions
    this usually means:

     o

        Minimize the critical section; *release the kernel* before making 
Java/BSF calls that can
        reenter.

     o

        If you must call out, *queue* work (post to another thread) instead of 
re-entering
        synchronously.

 *

    For *BSF4ooRexx*:

     o

        Ensure every |AttachThread| is matched with |DetachThread| in a 
|try/finally|.

     o

        Avoid *nested attaches* (|nestAttach| shows on your stack). If you’re 
already attached,
        reuse the context; don’t attach again.

     o

        If Java code calls into Rexx, and Rexx code calls back into Java, 
enforce a single
        direction at a time (e.g., schedule the callback asynchronously).

 *

    Add diagnostics:

     o

        On the Rexx side, enable verbose/trace around *kernel lock* operations 
if available; log
        when you enter/leave Rexx from Java and vice versa.

     o

        On the Java side, log before/after *BSF calls* and include current 
thread name + whether
        already attached.


      If it’s not a full deadlock (just a stall)

It can still be *head-of-line blocking*: one long-running Rexx activity holds the kernel, and many threads pile up on |requestAccess()|. The cure is the same—*shorten the locked region* or make the long task cooperative (yield/release).

If you paste the owner of the Rexx mutex from |!locks| (or a |jstack| snippet showing the thread doing the call into Rexx while others block), I can point at the exact offender and the safest place to release the lock.

best regards,

René.


_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
--
Gil Barmwater


_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

--
--
__________________________________________________________________________________

Prof. Dr. Rony G. Flatscher, iR
Department Wirtschaftsinformatik und Operations Management
WU Wien
Welthandelsplatz 1
A-1020  Wien/Vienna, Austria/Europe

http://www.wu.ac.at
__________________________________________________________________________________




_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to