This patch makes the crashes go away with the test program:
Index: interpreter/concurrency/ActivityManager.cpp
===================================================================
--- interpreter/concurrency/ActivityManager.cpp (revision 13003)
+++ interpreter/concurrency/ActivityManager.cpp (working copy)
@@ -568,6 +568,8 @@
*/
void ActivityManager::clearActivityPool()
{
+ ResourceSection lock; // critical section, protect
interaction with activity queues
+
Activity *activity = (Activity *)availableActivities->pull();
while (activity != OREF_NULL)
{
@@ -591,6 +593,8 @@
*/
bool ActivityManager::poolActivity(Activity *activity)
{
+ // ResourceSection lock held by caller: bool
InterpreterInstance::poolActivity(Activity *activity
+
// are we shutting down or have too many threads in the pool?
if (processTerminating || availableActivities->items() >
MAX_THREAD_POOL_SIZE)
{
clearActivityPool() was not sheltered as a critical section by a ResourceSection lock. The test
program now runs without crashes and a magnitude or two faster than before, which is a very pleasant
side effect.
As running the test suite works with no crashes, I will apply the patch to trunk such that Jenkins
can produce and test Rexx interpreters on all of the Jenkins platforms.
---
A request in this context: debugging this has been an enormous, almost irrational effort, also,
because there is important conceptual information about the interpreter missing for those who have
not created it, hence supplying this information is important to make sure that future generations
can learn, understand the architecture quickly to allow newcomers to maintain and/or extend the
interpreter for good! E.g. the ResourceSection serializes access to the various activity queues
which is important, however from the name this is not clear and there is no comment that explains
this, at least I did not find any. Or, what do the different locks shelter, serialize for what
reason. Or, which locks can be used in which sequence (this is buried/scattered in comments in the
source code, yet, getting quickly a conceptual overview of the interpreter would be possible, if
there was a conceptual overview of the architecture and most important parts of the interprter,
including a conceptual overview about flattening, rxapi and the like). This would benefit anyone in
this community who has a desire to understand the interpreter in an easy, quick manner!
---rony
On 06.08.2025 16:56, Rony G. Flatscher wrote:
Continuing to experiment with Sleep(23) to see where it would excercise a positive effect if not
crashing immediately at terminating the second instance on another thread:
* placed "Sleep()" in "Activity *ActivityManager.cpp::getRootActivity()"
o doing a "Sleep(23);" before the statement "Activity *oldActivity =
findActivity();":
+ crashes at terminating the second instance a couple of times, but
eventually it ran to
the end, creating 5,000 instances and terminating each on a
separate thread
o moving the "Sleep(23);" before the statement "Activity *activityObject =
createCurrentActivity();":
+ same as above, a crash terminating the second instance on a
separate thread,
re-running the program in the same session may yield success in
creating and
terminating 5,000 instances
* placed "Sleep()" in "Activity *ActivityManager::createCurrentActivity()"
o doing a "Sleep(23);" before the statement "Activity *activity = new
Activity(p, false);":
+ same as above, crashes at terminating the second instance a couple
of times, but
eventually it ran to the end, creating 5,000 instances and
terminating each on a
separate thread
o moving the "Sleep(23);" before the statement
"allActivities->append(activity);":
+ crashed once terminating the second instance on a separate thread,
re-running the
compiled program in the same session yields success in creating and
terminating all
5,000 instances, running it on new sessions succeeds!
So the Sleep(23) has an effect while appending the new activity to the allActivities queue in
createCurrentActivity(), as if concurrent access to the allActivities queue (a QueueClass) may
somehow cause a problem (the test program creates 5,000 interpreter instances and uses them on the
main thread to output a dot to stdout and immediately after using it, the instance gets terminated
on a separate thread).
Since placing the Sleep(23) right in front of allActivities->append(activity) in
createCurrentActivity(), the test program seems to succeed when run in a separate, new session
repeatedly.
Any ideas, suggestions?
---rony
P.S.: Enclosed the test program as used for these tests: "nmake Makefile2.windows" and
"testTerminate2.exe 1".
6:12
On 05.08.2025 12:51, Rony G. Flatscher wrote:
It seems that the crash can be delayed by a couple of thousand Rexx interpreter instances if
adding (Windows) "Sleep(20);" right before "rootActivity = ActivityManager::getRootActivity();"
in Interpreter.cpp's "InterpreterInstance *Interpreter::createInterpreterInstance(RexxOption
*options)".
Although this is a huge improvement to crashing after three interpreter instance creations after
having terminated one interpreter instance on another thread, it does not solve the problem. So a
real fix is needed (the problem may occur on any host application that wishes to employ ooRexx
scripts as macros, potentially creating and terminating interpreter instances on different
threads, and a crashing ooRexx would crash the entire process and hence the host application,
which is not acceptable for companies to employ ooRexx as a scripting language for their
applications, which would be really a boon for many).
Ultimately, it seems, the crash is linked to invoking "Activity
*ActivityManager::createCurrentActivity()" in ActivityManager.cpp. createCurrentActivity() gets
invoked via createInterpreterInstance()'s invocation of getRootActivity(). The root cause for
the crash seems to be that the interpreter instance gets created sometimes on the wrong
Actvity/thread. Not sure why doing a Sleep(20) before invoking getRootActivity() has such a
dramatic positive effect.
[The other path to invoking createCurrentActivity() is via "Activity
*ActivityManager::attachThread()". Note, the test program does not cause attachThread() to be
invoked as it immediately uses the created interpreter instance to create and run a routine and
then terminates it on a separate thread.]
Not sure how to proceed. Any ideas, suggestions?
---rony
On 30.07.2025 16:50, Rony G. Flatscher wrote:
Using this e-mail thread to report new insights.
Prolog: first of all, it is a little bit bewitched: sometimes it works, sometimes it doesn't.
Then, after many, many hours of all sort of debuggings I arrived at a situation where I could
reproduce the crash. And then, after further inspections and debug output edits, the crashes
would disappear and it took me again hours to become able to get to a version that would crash
in a certain constellation. Because of this I do not dare to edit (make it better readable) the
source of the test program as that caused the vanishment of the crash.
So enclosed you will find the patch "getInterpreterInfosWithIds.diff" that produces debug
information for interpreter instance creations and terminations. The enclosed program
"testTerminate2.cpp" can be compiled with "nmake Makefile2.windows" and produces
"testTerminate2.exe".
What is the purpose of testTerminate2.cpp?
* It creates an interpreter instance and uses it to display the ooRexx
version and then
terminates it, followed by a loop which gets carried out five times in
which a Rexx
interpreter instance gets created, and a little Rexx program
("tid=.context~thread;.error~charOut('<{tid='tid'}>');") gets executed on
the main thread.
* Depending on the command line arguments the termination of each Rexx
interpreter instance in
the loop occurs:
o (case 1) on the main thread, if no arguments are supplied: this works
flawlessly
o on a separate thread, if one or two arguments are supplied: this may
crash or succeed, if
+ (case 2) one argument gets supplied: this crashes in "bool
InterpreterInstance::terminate()" after invoking the statement:
"memoryObject.collectAndUninit(Interpreter::lastInstance());"
+ (case 3) two arguments get supplied: this works MOST of the time,
and if it crashes
it does so as in (case 2) above
With the enclosed version I have not yet managed to crash (case 3) which seems to be just a
matter of time/luck.
However with (case 2) I get the reported crash. The difference between (case 2) and (case 3),
which both terminate the Rexx interpreter instance on a separate thread, is "harmless", as the
only difference in (case 3) is that it carries out a "fflush(stderr)" before terminating the
instance (not sure why that has such a significant effect in this case).
Here the output of running the three cases:
* (case 1) output:
G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe
*** main() - entered testTerminate2.main() ***
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>
<IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance
version=5.2.0 language level=6.05
... terminating this interpreter instance
...<IntInst::terminate#529\this=I2\thread=T1>
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
done.
There is [1] argument, therefore:
testTerminate2.exe: creating total of [5] interpreters, running
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
terminating on SAME thread
<IntInst::inititalize#175/this=I4/thread=T1><{tid=1}><IntInst::terminate#529\this=I4\thread=T1>
<IntInst::inititalize#175/*this=I5/thread=T1*><IntInst::terminate#529\this=I5\thread=T1>
<IntInst::inititalize#175/this=I6/thread=T1><{tid=1}><IntInst::terminate#529\this=I6\thread=T1>
<IntInst::inititalize#175/*this=I7/thread=T1*><IntInst::terminate#529\this=I7\thread=T1>
<IntInst::inititalize#175/this=I8/thread=T1><{tid=1}><IntInst::terminate#529\this=I8\thread=T1>
<IntInst::inititalize#175/*this=I9/thread=T1*><IntInst::terminate#529\this=I9\thread=T1>
<IntInst::inititalize#175/this=I10/thread=T1><{tid=1}><IntInst::terminate#529\this=I10\thread=T1>
<IntInst::inititalize#175/*this=I11/thread=T1*><IntInst::terminate#529\this=I11\thread=T1>
<IntInst::inititalize#175/this=I12/thread=T1><{tid=1}><IntInst::terminate#529\this=I12\thread=T1>
<IntInst::inititalize#175/*this=I13/thread=T1*><IntInst::terminate#529\this=I13\thread=T1>
---
*** main() - done: created [5] instances, ran
[tid=.context~thread;.error~charOut('<{tid='tid'}>');], each terminated on the
SAME thread
---
*** main() - exiting testTerminate2.main() after Sleep(500); ***
* (case 2) output:
G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe 1
*** main() - entered testTerminate2.main() ***
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>
<IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance
version=5.2.0 language level=6.05
... terminating this interpreter instance
...<IntInst::terminate#529\this=I2\thread=T1>
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
done.
There are [2] arguments, therefore:
testTerminate2.exe: creating total of [5] interpreters, running
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
terminating on SEPARATE thread
<IntInst::inititalize#175/this=I4/thread=*T1*><{tid=1}>
Loop # [1], created thread [0000000000000210] for terminating instance ...
<IntInst::terminate#529\this=I4\thread=*T2*>
<IntInst::inititalize#175/*this=I5/thread=T2*><IntInst::terminate#529\this=I5\thread=*T2*>
<IntInst::inititalize#175/this=I6/thread=T1><{tid=1}>
Loop # [2], created thread [0000000000000214] for terminating instance ...
<IntInst::terminate#529\this=I6\thread=T3>
G:\tmp\orx\bugs\crashOnTerminate20250729>/** CRASH ** in
memoryObject.collectAndUninit(Interpreter::lastInstance()); /
* (case 3) output:
G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe 1 1
*** main() - entered testTerminate2.main() ***
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>
<IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance
version=5.2.0 language level=6.05
... terminating this interpreter instance
...<IntInst::terminate#529\this=I2\thread=T1>
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
done.
There are [3] arguments, therefore:
testTerminate2.exe: creating total of [5] interpreters, running
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
terminating on SEPARATE thread (doing a 'fflush(stderr)'
in termination thread)
<IntInst::inititalize#175/this=I4/thread=T1><{tid=1}>
Loop # [1], created thread [0000000000000210] for terminating instance ...
<IntInst::inititalize#175/this=I5/thread=T1><IntInst::terminate#529\this=I4\thread=T2>
<{tid=1}>
Loop # [2], created thread [000000000000022C] for terminating instance ...
<IntInst::inititalize#175/this=I6/thread=T1><IntInst::terminate#529\this=I5\thread=T3>
<{tid=1}>
Loop # [3], created thread [0000000000000218] for terminating instance ...
<IntInst::inititalize#175/this=I7/thread=T1><IntInst::terminate#529\this=I6\thread=T4>
<{tid=1}>
Loop # [4], created thread [0000000000000224] for terminating instance ...
<IntInst::inititalize#175/this=I8/thread=T1><IntInst::terminate#529\this=I7\thread=T5>
<{tid=1}>
Loop # [5], created thread [0000000000000258] for terminating instance ...
---
*** main() - done: created [5] instances, ran
[tid=.context~thread;.error~charOut('<{tid='tid'}>');], each terminated on
SEPARATE threads
---
<IntInst::terminate#529\this=I8\thread=*T6*>
<IntInst::inititalize#175/*this=I9/thread=T6*><IntInst::terminate#529\this=I9\thread=*T6*>
*** main() - exiting testTerminate2.main() after Sleep(500); ***
The creation of the Rexx interpreter instances by the test program occurs on thread T1 on which
the Rexx program gets carried out (producing the string "<{tid=1}>").
(case 1) creates an interpreter in the loop and terminates it on the same thread. Creating and
terminating another interpreter instance thereafter works without a crash.
The difference between (case 2) which causes a crash and (case 3) seems to be the fact, that
(case 2) shuts down with I5 on T2, but continues to create a new interpreter I6 on T1,
terminates it on T3 and there the crash occurs (in the memoryObject.collectAndUninit() call branch).
(case 3) on the other hand does not shut down the interpreter, except after the last created
interpreter I8 got terminated on T6, causing I9 on T6 to shut down, which works without a crash
in this case.
Maybe these observation help shed light on what might be causing the crash.
---rony
On 23.12.2023 18:03, Rony G. Flatscher wrote:
Since last August a crash got reported in <https://sourceforge.net/p/oorexx/bugs/1872/> which
can be easily reproduced by following the directions there. Using the supplied simple C++-only
program will work flawlessly if invoked simply as
testTerminate.exe
it will run flawlessly using 250 Rexx interpreter instances to execute a simple script (issuing
a single dot on stdout) and then terminate it on the same thread.
... cut ...
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel