It seems that the following patch fixes the reported crash:
Index: interpreter/runtime/Interpreter.cpp
===================================================================
--- interpreter/runtime/Interpreter.cpp (revision 12998)
+++ interpreter/runtime/Interpreter.cpp (working copy)
@@ -235,11 +235,15 @@
// to shut down so that the package manager can unload
// the libraries (it needs to pass a RexxThreadContext
// pointer out to package unloaders, if they are defined)
- InstanceBlock instance;
+ InstanceBlock tmpInstance;
+ tmpInstance.instance->InterpreterInstance::attachThread();
+
// run whatever uninits we can before we start releasing the
libraries
memoryObject.lastChanceUninit();
PackageManager::unload();
+
+ tmpInstance.instance->InterpreterInstance::detachThread();
}
catch (ActivityException)
{
Any comments, suggestions?
---rony
On 23.12.2023 18:03, Rony G. Flatscher wrote:
Since last August a crash got reported in <https://sourceforge.net/p/oorexx/bugs/1872/> which can
be easily reproduced by following the directions there. Using the supplied simple C++-only program
will work flawlessly if invoked simply as
testTerminate.exe
it will run flawlessly using 250 Rexx interpreter instances to execute a simple script (issuing a
single dot on stdout) and then terminate it on the same thread.
However if supplying an argument to that test program (source got uploaded with #1872 as well),
then the termination of the Rexx interpreter instance occurs on a separate thread and will cause
the crash upon terminating the second Rexx interpreter instance on a separate thread:
testTerminate.exe
anyArgumentSufficesToRunTheRIIOnSeparateThreadAndCauseTheCrash
The relevant Windows code is:
... cut ...
//
from<https://stackoverflow.com/questions/1981459/using-threads-in-c-on-windows-simple-example>,
//
also<https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createthread>
// #include <windows.h>
DWORD WINAPI ThreadFunc(void* data) {
// Do stuff. This will be the first function called on the new thread.
// When this function returns, the thread goes away. See MSDN for more
details.
RexxInstance *instance = (RexxInstance *) data;
// uncommenting the following statement, makes it run
*// fprintf(stderr, "ThreadFunc(), arrived: terminating instance
[%p]\n",
instance);fflush(stderr); *
// Now wait for the interpreter to terminate and we are done.
instance->Terminate();
// fprintf(stderr, "ThreadFunc(), leaving: terminating instance
[%p]\n", instance);fflush(stderr);
return 0;
... cut ...
if (argc>1) // if argument given terminate on separate thread
{
void *data = interpreter;
HANDLE thread = CreateThread(NULL, 0, ThreadFunc, data, 0,
NULL);
if (thread)
{
fprintf(stderr, "Created [%d], thread [%p] for terminating
instance [%p]\n", i, thread, interpreter);fflush(stderr);
}
}
else // terminate on main thread
{
// Now wait for the interpreter to terminate and we are done.
interpreter->Terminate();
}
... cut ...
Uncommenting the blue fprintf statement makes the crash go away *most* of the time such that all
250 Rexx interpreter instances can be terminated on a separate thread. Reducing that to just
fflush(stderr) makes it go away *most* of the time as well (whereas using fflush(stdout) instead
does not have any effect. So accessing stderr in that thread somehow influences the behaviour).
---
Originally this crash occurred in the context of BSF4ooRexx and I reported it in
<https://sourceforge.net/p/oorexx/feature-requests/806/>. It seems that Rick thought it had to do
with the code in BSF4ooRexx and not with ooRexx. Eventually this thread led to a new, small C++
(supplied with #1872 above) only (only creates a RII, runs a simple Rexx script, terminates the
RII) showcase in which no external dependency could be made responsible for the crash, such that
it becomes clear that the cause lies in ooRexx itself.
As the crash occurs rather early, if using "testTerminate.exe someArg" I had hoped that some of
the C++ gurus would be able to locate (and hopfeully fix) the cause relatively quickly (realizing
that multithreaded bugs can be extremely hard to trace down and to fix). Unfortunately, this has
not happened to this day (after almost 16 months). As this is a showstopper and inhibits using
this great ooRexx feature, namely being able to create any number of different Rexx interpreter
instances (each having its own .local) in a process and terminate them.
Therefore I kindly request any of the C++ experts to look into this matter if possible. Maybe the
upcoming vacation time allows for a few time slots to use for analyzing, finding and hopefully
fixing the cause of this showstopper crash!
---rony
P.S.: Have simplified the test code a little bit, also allowing the test program to run with all
three reported scenarios; it is enclosed as (also slightly stripped down) "testTerminate2.cpp" and
"Makefile2.windows" ("nmake /f Makefil2.windows" will create testTerminate2.exe) which after
successful compilation can be invoked as:
* testTerminate2.exe
o no arguments: will create and terminate all 250 Rexx interpreter
instances (RII) on the
same (main) thread without any crash
* testTerminate2.exe oneArg
o any single argument will cause the crash to always occur while
terminating the *second*
RII on a separate thread (as if terminating the previous RII on a
separate thread left
something in an unbalanced state)
* testTerminate2.exe oneArg secondArg
o any two arguments will cause fflush(stderr) in the terminating thread
to be issued and as
a result most of the time all 250 Rexx interpreter instances can be
terminated on a
separate thread without a crash; running it a couple of times may
however cause that crash
to occur at different RIIs
Terminate() will kick off the garbage collector and the crash seems to occur
during its run.
P.P.S.: Would it be possible to have the termination of a RII carried out on the same thread as
its creation (which seems to run stably) by ooRexx itself (e.g. having a specific thread to create
and to terminate RIIs)?
This may need a function to learn about whether a RII is currently running/blocked and only
terminate it on the creation thread if not in use so not to block the creation thread. Maybe also
(an asynchroneous?) a function to force termination of a RII would be helpful in such a scenario,
maybe with an option that indicates whether running/blocking Rexx code on that RII should be
forcefully halted or not.
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel