Hi,
I’m revisiting this because although my patch fixed the problem and did not show up any new ones, strictly speaking there is one more thing that needs to be done for correctness and future proofing The patch as it stands carries out an extra lock release in InterpreterInstance::terminate whether it needs to or not just to deal with the case where an extra one is actually needed because a new activity was created. For cases where it is not needed this extra unlock theoretically leads to the lock count becoming negative. The locks we currently use are forgiving of and ignore this but it is still wrong, and were we ever to switch to using different locks (e.g. C++ recursive_mutexes – which I *might* have tested locally) or should future operating system releases be less forgiving this imbalance could causes deadlocks or other errors when interpreter instances shutdown “normally” (i.e. without the extra activity creation) The attached patch adds an “activity created” flag in a few places so that InterpreterInstance::terminate can determine when a new activity was actually created and only carry out the extra unlock when it was. I’ve added this to the ticket as well for completeness. Kind Regards, Dom From: Rony G. Flatscher <[email protected]> Sent: 25 August 2025 14:31 To: [email protected] Subject: Re: [Oorexx-devel] No, only the hang bug left and that can be solved with Dom Wise's patch (Re: Maybe two different bugs? A question ad fixing the hang Judging from the results on Jenkins this patch did not cause any failures, hence marking the reported bug (https://sourceforge.net/p/oorexx/bugs/2030/) as being fixed. ---rony On 24.08.2025 11:02, Rony G. Flatscher wrote: On 23.08.2025 23:30, [email protected] <mailto:[email protected]> wrote: Attached is the diff for the full fix (from clean svn 13006 ) which only nudges the dispatch queue once. +1 Thank you very much, Dom, kudos! Tested it against the ooRexx test suite on my Windows machine which passes. Also tested it against the JavaFX ooRexx application (BSF4ooRexx850\samples\JavaFX\MainApp.rex) under Java 8LTS, 17LTS and 24, which now thankfully works again, which is really *great*! Will commit your patch and we will see via the Jenkins managed operating systems how the ooRexx test suite fares on all of them. Cheers ---rony From: <mailto:[email protected]> [email protected] <mailto:[email protected]> <[email protected]> Sent: 23 August 2025 21:49 To: 'Open Object Rexx Developer Mailing List' <mailto:[email protected]> <[email protected]> Subject: RE: [Oorexx-devel] No, only the hang bug left and that can be solved with Dom Wise's patch (Re: Maybe two different bugs? A question ad fixing the hang Also changed in Activity.hpp (missed this earlier) -+ void exitCurrentThread(bool dispatch = true); I’ll put together a proper diff From: <mailto:[email protected]> [email protected] < <mailto:[email protected]> [email protected]> Sent: 23 August 2025 20:24 To: 'Open Object Rexx Developer Mailing List' < <mailto:[email protected]> [email protected]> Subject: RE: [Oorexx-devel] No, only the hang bug left and that can be solved with Dom Wise's patch (Re: Maybe two different bugs? A question ad fixing the hang Hi Rony, Nice work! As regards to my possible fix I was thinking that it might be a good idea to modify it slightly with a few additional small changes. What I’m thinking is that ActivityManager::releaseAccess should have a flag added which determines whether or not to call dispatchNext(). This is because in InterpreterInstance::terminate, calling exitCurrentThread() will ultimately call this this and so will the additional “fallback” ActivityManager::releaseAccess call made later on. It looks like ActivityManager::dispatchNext removes the head of the activity queue and tries to set it running. If this is called twice I’m not sure whether this would cause problems. What might be a good idea is for Activity::exitCurrentThread to have an optional bool “dispatch” parameter, set to false in this one instance, along with Activity::releaseAccess which it calls, and have that pass through the parameter to ActivityManager::releaseAccess, modified to only only call dispatchNext() when the flag is true. This exitCurrentThread would then release the kernel lock but not immediately try to dispatch another thread. The newly added call to ActivityManager::releaseAccess (the “fix”) would have the usual call to release (if still held) the kernel lock AND call dispatchNext, ensuring it always gets called exactly once. Below is a summary of the changes. Lines with -+ are changed lines. If this seems like a good idea let me know and I’ll get a fresh SVN tree to make the changes in so that I can provide a proper diff file. Likewise if there is any critique or suggestion for modification I’m all ears! In Activity.hpp -+ void releaseAccess(bool dispatch = true); In Activity.cpp void Activity::exitCurrentThread(bool dispatch) { …. -+ releaseAccess(dispatch); } void Activity::releaseAccess(bool dispatch) { … -+ ActivityManager::releaseAccess(dispatch); } } In ActivityManager.hpp -+ static void releaseAccess(bool dispatch = true); In ActivityManager.cpp void ActivityManager::releaseAccess(bool dispatch) { … ++ if (dispatch) ++ { dispatchNext(); ++ } } In InterpreterInstance.cpp bool InterpreterInstance::terminate() { … -+ current->exitCurrentThread(false); … -+ current->exitCurrentThread(false); … ++ ActivityManager::releaseAccess(); // the ‘fix’ Kind Regards, Dom From: Rony G. Flatscher < <mailto:[email protected]> [email protected]> Sent: 23 August 2025 17:13 To: <mailto:[email protected]> [email protected] Subject: [Oorexx-devel] No, only the hang bug left and that can be solved with Dom Wise's patch (Re: Maybe two different bugs? A question ad fixing the hang On 22.08.2025 21:58, Rony G. Flatscher wrote: On 22.08.2025 13:31, Rony G. Flatscher wrote: Having tested the interpreter without Dom's patch with the samples fxml_25, fxml_26, fxml_27, and fxml_99 ("BSF4ooRexx850\samples\JavaFX") the following could be observed using different versions of Java (Java 8 and in the end Java 24): 1. the hang of fxml_99 occurs on all Java versions 2. fxml_25, fxml_26, and fxml_27 seem to work on Java 8 1. fxml_26 and fxml_27 crash on Java 24 This leads me to believe that there might be two different problems here. Fixing the hang somehow causes the crashes of fxml_26 and fxml_27 (and then sometimes of fxml_25) on both, Java 8 and Java 24, so they seem to be uncovered earlier. ... cut ... One important note: here "crashes" should be rephrased to "Java NullPointer exceptions", these are *not* crashes of the process! Will runtime debug BSF.CLS, JNI and the Java side of the bridge once more, but this may take some time as glimpses of private life take precedence this weekend ... ;) After going through the BSF.CLS, BSF4ooRexx.cc/JNI, Java bridge programs (creating MBs of debug data) I found out that the version of BSF4ooRexx850 I have been working got tampered by myself! :-( After inspecting my changes (basically debug statements) I found the culprit (it was an error in the Java reflection part of Java constructors). Tested fxml_20, fxml_25, fxml_26, fxml_27, fxml_99 on Java 8, Java 17 and Java 24. They work thankfully! :) --- The hang bug (fxml_99) can be fixed with Dom Wise's patch. So far no feedback to the contrary has been given and the test to apply it a hundred times when terminating an interpreter instance without any side effects (which would be surprising seeing the code that gets executed), such that I would like to apply his patch (unlocking the kernel by invoking "ActivityManager::releaseAccess()" right before invoking "Intepreter::terminateInterpreterInstance(this)) to trunk: Index: interpreter/runtime/InterpreterInstance.cpp =================================================================== --- interpreter/runtime/InterpreterInstance.cpp (revision 13006) +++ interpreter/runtime/InterpreterInstance.cpp (working copy) @@ -559,6 +559,7 @@ commandHandlers = OREF_NULL; requiresFiles = OREF_NULL; +ActivityManager::releaseAccess(); // tell the main interpreter controller we're gone. Interpreter::terminateInterpreterInstance(this); Any objections? With fixing the hang bug there are no open show-stopper bugs in ooRexx anymore! ---rony
rebalance-terminate-kernel-locks.diff
Description: Binary data
_______________________________________________ Oorexx-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oorexx-devel
