[Oorexx-devel] Probably a fix (Re: Further observations (Re: Some new insights (Re: Some observations and a test program ... (Re: Request for fixing crash reported in bug item #1872

Rony G. Flatscher Mon, 11 Aug 2025 05:25:42 -0700

This patch makes the crashes go away with the test program:


   Index: interpreter/concurrency/ActivityManager.cpp
   ===================================================================
   --- interpreter/concurrency/ActivityManager.cpp      (revision 13003)
   +++ interpreter/concurrency/ActivityManager.cpp      (working copy)
   @@ -568,6 +568,8 @@
      */
     void ActivityManager::clearActivityPool()
     {
   +    ResourceSection lock;                // critical section, protect 
interaction with activity queues
   +
         Activity *activity = (Activity *)availableActivities->pull();
         while (activity != OREF_NULL)
         {
   @@ -591,6 +593,8 @@
      */
     bool ActivityManager::poolActivity(Activity *activity)
     {
   +    // ResourceSection lock held by caller: bool 
InterpreterInstance::poolActivity(Activity *activity
   +
         // are we shutting down or have too many threads in the pool?
         if (processTerminating || availableActivities->items() > 
MAX_THREAD_POOL_SIZE)
         {

clearActivityPool() was not sheltered as a critical section by a ResourceSection lock. The testprogram now runs without crashes and a magnitude or two faster than before, which is a very pleasantside effect.

As running the test suite works with no crashes, I will apply the patch to trunk such that Jenkinscan produce and test Rexx interpreters on all of the Jenkins platforms.

---

A request in this context: debugging this has been an enormous, almost irrational effort, also,because there is important conceptual information about the interpreter missing for those who havenot created it, hence supplying this information is important to make sure that future generationscan learn, understand the architecture quickly to allow newcomers to maintain and/or extend theinterpreter for good! E.g. the ResourceSection serializes access to the various activity queueswhich is important, however from the name this is not clear and there is no comment that explainsthis, at least I did not find any. Or, what do the different locks shelter, serialize for whatreason. Or, which locks can be used in which sequence (this is buried/scattered in comments in thesource code, yet, getting quickly a conceptual overview of the interpreter would be possible, ifthere was a conceptual overview of the architecture and most important parts of the interprter,including a conceptual overview about flattening, rxapi and the like). This would benefit anyone inthis community who has a desire to understand the interpreter in an easy, quick manner!


---rony


On 06.08.2025 16:56, Rony G. Flatscher wrote:

Continuing to experiment with Sleep(23) to see where it would excercise a positive effect if notcrashing immediately at terminating the second instance on another thread:
  * placed "Sleep()" in "Activity *ActivityManager.cpp::getRootActivity()"

      o doing a "Sleep(23);" before the statement "Activity *oldActivity = 
findActivity();":
          + crashes at terminating the second instance a couple of times, but 
eventually it ran to
            the end, creating 5,000 instances and terminating each on a 
separate thread

      o moving the "Sleep(23);" before the statement "Activity *activityObject =
        createCurrentActivity();":
          + same as above, a crash terminating the second instance on a 
separate thread,
            re-running the program in the same session may yield success in 
creating and
            terminating 5,000 instances

  * placed "Sleep()" in "Activity *ActivityManager::createCurrentActivity()"

      o doing a "Sleep(23);" before the statement "Activity *activity = new 
Activity(p, false);":
          + same as above, crashes at terminating the second instance a couple 
of times, but
            eventually it ran to the end, creating 5,000 instances and 
terminating each on a
            separate thread

      o moving the "Sleep(23);" before the statement 
"allActivities->append(activity);":
          + crashed once terminating the second instance on a separate thread, 
re-running the
            compiled program in the same session yields success in creating and 
terminating all
            5,000 instances, running it on new sessions succeeds!
So the Sleep(23) has an effect while appending the new activity to the allActivities queue increateCurrentActivity(), as if concurrent access to the allActivities queue (a QueueClass) maysomehow cause a problem (the test program creates 5,000 interpreter instances and uses them on themain thread to output a dot to stdout and immediately after using it, the instance gets terminatedon a separate thread).
Since placing the Sleep(23) right in front of allActivities->append(activity) increateCurrentActivity(), the test program seems to succeed when run in a separate, new sessionrepeatedly.
Any ideas, suggestions?

---rony
P.S.: Enclosed the test program as used for these tests: "nmake Makefile2.windows" and"testTerminate2.exe 1".
6:12


On 05.08.2025 12:51, Rony G. Flatscher wrote:
It seems that the crash can be delayed by a couple of thousand Rexx interpreter instances ifadding (Windows) "Sleep(20);" right before "rootActivity = ActivityManager::getRootActivity();"in Interpreter.cpp's "InterpreterInstance *Interpreter::createInterpreterInstance(RexxOption*options)".
Although this is a huge improvement to crashing after three interpreter instance creations afterhaving terminated one interpreter instance on another thread, it does not solve the problem. So areal fix is needed (the problem may occur on any host application that wishes to employ ooRexxscripts as macros, potentially creating and terminating interpreter instances on differentthreads, and a crashing ooRexx would crash the entire process and hence the host application,which is not acceptable for companies to employ ooRexx as a scripting language for theirapplications, which would be really a boon for many).
Ultimately, it seems, the crash is linked to invoking "Activity*ActivityManager::createCurrentActivity()" in ActivityManager.cpp. createCurrentActivity() getsinvoked via createInterpreterInstance()'s invocation of getRootActivity(). The root cause forthe crash seems to be that the interpreter instance gets created sometimes on the wrongActvity/thread. Not sure why doing a Sleep(20) before invoking getRootActivity() has such adramatic positive effect.
[The other path to invoking createCurrentActivity() is via "Activity*ActivityManager::attachThread()". Note, the test program does not cause attachThread() to beinvoked as it immediately uses the created interpreter instance to create and run a routine andthen terminates it on a separate thread.]
Not sure how to proceed. Any ideas, suggestions?

---rony


On 30.07.2025 16:50, Rony G. Flatscher wrote:
Using this e-mail thread to report new insights.
Prolog: first of all, it is a little bit bewitched: sometimes it works, sometimes it doesn't.Then, after many, many hours of all sort of debuggings I arrived at a situation where I couldreproduce the crash. And then, after further inspections and debug output edits, the crasheswould disappear and it took me again hours to become able to get to a version that would crashin a certain constellation. Because of this I do not dare to edit (make it better readable) thesource of the test program as that caused the vanishment of the crash.
So enclosed you will find the patch "getInterpreterInfosWithIds.diff" that produces debuginformation for interpreter instance creations and terminations. The enclosed program"testTerminate2.cpp" can be compiled with "nmake Makefile2.windows" and produces"testTerminate2.exe".
What is the purpose of testTerminate2.cpp?

  * It creates an interpreter instance and uses it to display the ooRexx 
version and then
    terminates it, followed by a loop which gets carried out five times in 
which a Rexx
    interpreter instance gets created, and a little Rexx program
    ("tid=.context~thread;.error~charOut('<{tid='tid'}>');") gets executed on 
the main thread.

  * Depending on the command line arguments the termination of each Rexx 
interpreter instance in
    the loop occurs:
      o (case 1) on the main thread, if no arguments are supplied: this works 
flawlessly

      o on a separate thread, if one or two arguments are supplied: this may 
crash or succeed, if

          + (case 2) one argument gets supplied: this crashes in "bool
            InterpreterInstance::terminate()" after invoking the statement:
            "memoryObject.collectAndUninit(Interpreter::lastInstance());"

          + (case 3) two arguments get supplied: this works MOST of the time, 
and if it crashes
            it does so as in (case 2) above
With the enclosed version I have not yet managed to crash (case 3) which seems to be just amatter of time/luck.
However with (case 2) I get the reported crash. The difference between (case 2) and (case 3),which both terminate the Rexx interpreter instance on a separate thread, is "harmless", as theonly difference in (case 3) is that it carries out a "fflush(stderr)" before terminating theinstance (not sure why that has such a significant effect in this case).
Here the output of running the three cases:

  * (case 1) output:
    G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe
    *** main() - entered testTerminate2.main() ***

    
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>

    <IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance 
version=5.2.0 language level=6.05
    ... terminating this interpreter instance 
...<IntInst::terminate#529\this=I2\thread=T1>

    
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
      done.

    There is [1] argument, therefore:
             testTerminate2.exe: creating total of [5] interpreters, running 
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
                     terminating on SAME thread


    
<IntInst::inititalize#175/this=I4/thread=T1><{tid=1}><IntInst::terminate#529\this=I4\thread=T1>

    
<IntInst::inititalize#175/*this=I5/thread=T1*><IntInst::terminate#529\this=I5\thread=T1>

    
<IntInst::inititalize#175/this=I6/thread=T1><{tid=1}><IntInst::terminate#529\this=I6\thread=T1>

    
<IntInst::inititalize#175/*this=I7/thread=T1*><IntInst::terminate#529\this=I7\thread=T1>

    
<IntInst::inititalize#175/this=I8/thread=T1><{tid=1}><IntInst::terminate#529\this=I8\thread=T1>

    
<IntInst::inititalize#175/*this=I9/thread=T1*><IntInst::terminate#529\this=I9\thread=T1>

    
<IntInst::inititalize#175/this=I10/thread=T1><{tid=1}><IntInst::terminate#529\this=I10\thread=T1>

    
<IntInst::inititalize#175/*this=I11/thread=T1*><IntInst::terminate#529\this=I11\thread=T1>

    
<IntInst::inititalize#175/this=I12/thread=T1><{tid=1}><IntInst::terminate#529\this=I12\thread=T1>

    
<IntInst::inititalize#175/*this=I13/thread=T1*><IntInst::terminate#529\this=I13\thread=T1>

    ---
    *** main() - done: created [5] instances, ran 
[tid=.context~thread;.error~charOut('<{tid='tid'}>');], each terminated on the 
SAME thread
    ---
    *** main() - exiting testTerminate2.main() after Sleep(500); ***


  * (case 2) output:
    G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe 1
    *** main() - entered testTerminate2.main() ***

    
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>

    <IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance 
version=5.2.0 language level=6.05
    ... terminating this interpreter instance 
...<IntInst::terminate#529\this=I2\thread=T1>

    
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
      done.

    There are [2] arguments, therefore:
             testTerminate2.exe: creating total of [5] interpreters, running 
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
                     terminating on SEPARATE thread


    <IntInst::inititalize#175/this=I4/thread=*T1*><{tid=1}>
    Loop # [1], created thread [0000000000000210] for terminating instance ...
    <IntInst::terminate#529\this=I4\thread=*T2*>

    
<IntInst::inititalize#175/*this=I5/thread=T2*><IntInst::terminate#529\this=I5\thread=*T2*>

    <IntInst::inititalize#175/this=I6/thread=T1><{tid=1}>
    Loop # [2], created thread [0000000000000214] for terminating instance ...
    <IntInst::terminate#529\this=I6\thread=T3>

    G:\tmp\orx\bugs\crashOnTerminate20250729>/** CRASH ** in 
memoryObject.collectAndUninit(Interpreter::lastInstance()); /
  * (case 3) output:
    G:\tmp\orx\bugs\crashOnTerminate20250729>rxenv testTerminate2.exe 1 1
    *** main() - entered testTerminate2.main() ***

    
<IntInst::inititalize#175/this=I1/thread=T1><IntInst::terminate#529\this=I1\thread=T1>

    <IntInst::inititalize#175/this=I2/thread=T1>Created interpreter instance 
version=5.2.0 language level=6.05
    ... terminating this interpreter instance 
...<IntInst::terminate#529\this=I2\thread=T1>

    
<IntInst::inititalize#175/this=I3/thread=T1><IntInst::terminate#529\this=I3\thread=T1>
      done.

    There are [3] arguments, therefore:
             testTerminate2.exe: creating total of [5] interpreters, running 
[tid=.context~thread;.error~charOut('<{tid='tid'}>');]
                     terminating on SEPARATE thread (doing a 'fflush(stderr)' 
in termination thread)


    <IntInst::inititalize#175/this=I4/thread=T1><{tid=1}>
    Loop # [1], created thread [0000000000000210] for terminating instance ...

    
<IntInst::inititalize#175/this=I5/thread=T1><IntInst::terminate#529\this=I4\thread=T2>
    <{tid=1}>
    Loop # [2], created thread [000000000000022C] for terminating instance ...

    
<IntInst::inititalize#175/this=I6/thread=T1><IntInst::terminate#529\this=I5\thread=T3>
    <{tid=1}>
    Loop # [3], created thread [0000000000000218] for terminating instance ...

    
<IntInst::inititalize#175/this=I7/thread=T1><IntInst::terminate#529\this=I6\thread=T4>
    <{tid=1}>
    Loop # [4], created thread [0000000000000224] for terminating instance ...

    
<IntInst::inititalize#175/this=I8/thread=T1><IntInst::terminate#529\this=I7\thread=T5>
    <{tid=1}>
    Loop # [5], created thread [0000000000000258] for terminating instance ...

    ---
    *** main() - done: created [5] instances, ran 
[tid=.context~thread;.error~charOut('<{tid='tid'}>');], each terminated on 
SEPARATE threads
    ---
    <IntInst::terminate#529\this=I8\thread=*T6*>

    
<IntInst::inititalize#175/*this=I9/thread=T6*><IntInst::terminate#529\this=I9\thread=*T6*>
    *** main() - exiting testTerminate2.main() after Sleep(500); ***
The creation of the Rexx interpreter instances by the test program occurs on thread T1 on whichthe Rexx program gets carried out (producing the string "<{tid=1}>").
(case 1) creates an interpreter in the loop and terminates it on the same thread. Creating andterminating another interpreter instance thereafter works without a crash.
The difference between (case 2) which causes a crash and (case 3) seems to be the fact, that(case 2) shuts down with I5 on T2, but continues to create a new interpreter I6 on T1,terminates it on T3 and there the crash occurs (in the memoryObject.collectAndUninit() call branch).
(case 3) on the other hand does not shut down the interpreter, except after the last createdinterpreter I8 got terminated on T6, causing I9 on T6 to shut down, which works without a crashin this case.
Maybe these observation help shed light on what might be causing the crash.

---rony


On 23.12.2023 18:03, Rony G. Flatscher wrote:
Since last August a crash got reported in <https://sourceforge.net/p/oorexx/bugs/1872/> whichcan be easily reproduced by following the directions there. Using the supplied simple C++-onlyprogram will work flawlessly if invoked simply as
    testTerminate.exe
it will run flawlessly using 250 Rexx interpreter instances to execute a simple script (issuinga single dot on stdout) and then terminate it on the same thread.
... cut ...

_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

[Oorexx-devel] Probably a fix (Re: Further observations (Re: Some new insights (Re: Some observations and a test program ... (Re: Request for fixing crash reported in bug item #1872

Reply via email to