Dear Rony,
I think there may actually only be one “cause” for the multiple lock acquires after all. Some additional errors were seen when I tried to fix the one below (with some attempts at lock balancing logic) and this suggested a second issue but it turns out I was initially using the wrong unlock function and switching to the one I submitted was all I needed to do. My understanding is as follows (all methods of InterpreterInstance unless specified) * terminate() calls enterOnCurrentThread() to get a valid Activity for running any Uninit methods * enterOnCurrentThread() calls attachThread(*noargs*) * attachThread(*noargs*) checks for an existing usable activity with findActvity(). If it finds one it returns it and there is no change to the kernel lock count at this point. If not it creates via a new one with ActivityManager::attachThread() which acquires but does not release the kernel lock * On returning to enterOnCurrentThread(), Activity::requestApiAccess() is called which either immediately acquires the kernel lock for the returned activity if it is available or puts the activity into a wait queue. This is on top of any earlier acquire in the bold code If the bold code is executed then the calling thread will end up holding the kernel lock after terminate() completes. I’m not sure exactly what additional checks need to be carried out and where an extra release needs to go to ensure all functions in this call chain work correctly under all circumstances and always only unlock as many times as were locked, hence the “hack” I hope this helps to clarify the issue further, and of course if there are any glaring errors in my analysis I’ll only learn if someone (gently I hope!) points them out to me 😊 Kind Regards, Dom From: Rony G. Flatscher <rony.flatsc...@wu.ac.at> Sent: 17 August 2025 20:12 To: oorexx-devel@lists.sourceforge.net Subject: Re: [Oorexx-devel] Some more information, and a random run (Re: Question ad a hang situation Dear Dom, On 17.08.2025 17:31, dominicjw...@gmail.com <mailto:dominicjw...@gmail.com> wrote: I am one such masochist that Rony was referring to 😊 Very glad that you are one! ;) It seems that in InterpreterInstance::terminate() there can be situations where the kernel lock ends up being taken twice. It is only ever released once and this means that when it has been locked twice the thread leaves the function owning the kernel lock and can go on happily locking and releasing it but any other thread trying to access it is immediately and forever locked. I spent quite some time trying to figure out and code for the individual cases (I think there may be at least two) to ensure the acquires and releases are always balanced but in the end I gave up and as a hack added an extra call to ActivityManager::releaseAccess() at the end of this function. This seems to have done the trick and the program runs ok with this in place. This is *great*, thank you very much! Could you briefly describe the two individual cases, if you have them still handy? The reasons for the additional kernel locks may be easy to identify or they may a symptom of problems elsewhere that could be very subtle or nasty but hopefully this provides a starting point Given that your hack removes the hang, and given that the testsuite runs to completion on my Windows machine, I would like to apply that hack to trunk to see, how it does on all Jenkin hosted operating systems. Here the patch I have been using and testing (yes it fixes the hang, *very* appreciated!!): Index: interpreter/runtime/InterpreterInstance.cpp =================================================================== --- interpreter/runtime/InterpreterInstance.cpp (revision 13004) +++ interpreter/runtime/InterpreterInstance.cpp (working copy) @@ -299,6 +299,7 @@ { terminationSem.post(); } + ActivityManager::releaseAccess(); // hack kudos to Dom Wise (20250817 e-mail on developer list), remove kernel lock again return true; } Of course, once a fix can be found, this patch should be replaced. First I will create an entry in the bug tracker and then commit that patch and record it against the bug. Dom, again *many* thanks for your efforts and sharing your findings, *very* much appreciated! Best regards ---rony -----Original Message----- From: Rony G. Flatscher <mailto:rony.flatsc...@wu.ac.at> <rony.flatsc...@wu.ac.at> Sent: 17 August 2025 09:30 To: oorexx-devel@lists.sourceforge.net <mailto:oorexx-devel@lists.sourceforge.net> Subject: Re: [Oorexx-devel] Some more information, and a random run (Re: Question ad a hang situation Hi Michael, On 17.08.2025 03:47, Michael Lueck wrote: W O W ! ! ! Bravo!!! A most impressive success milestone! but not solved yet. It looks as it is somehow linked to terminating interpreter instances right before the callback from Java into ooRexx. The change I made was a debug output statement from the Java garbage collector run (collecting the peer Java RexxEngines) right before the ooRexx instance gets terminated from the Java side, as if this little time span makes the difference (at least on my machine) and allows the Java callback into ooRexx to not block. ---rony
_______________________________________________ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel