I'm still getting bus errors from lh_delete.
I'm finding it very easy to reproduce the error, although I can't force it
to happen at a particular point in time or on a particular invocation of the
offending function. Sometimes it happens in the first few hundred calls,
and sometimes it happens after tens of thousands.
The dbx stack trace below shows where the error occurs. This is happening
at thread exit. Just before a thread exits, I make the following sequence
of calls:
int iErr = ERR_get_error ();
ERR_error_string (iErr, buf);
ERR_reason_error_string (iErr);
ERR_remove_state (0);
The call to ERR_remove_state (0) is the one that sporadically causes bus
errors. I am using locks and locking callbacks, and they all seem to be
working just fine. If I take out the ERR_remove_state (0); call, things
appear to work just fine and my program never crashes.
The test that I am running starts and stops roughly 12 threads and TCP/IP
connections a second. (Specifically, the test software starts a thread that
that connects via TCP/IP to another copy of itself, lets the thread run for
a random period of time between 0 and 5 seconds, and then terminates the
Socket connection and exits. The main loop of the test program strives to
keep 60 of these threads running at once, so we get about 12 threads a
second stopping, and 12 threads a second starting to replace the ones that
stop.)
It may be significant to note that the threads don't even have to actually
establish any SSL connections for the problem to occur. Even if I just
initialize the OpenSSL library and use clear-text sockets for the test, the
problem happens. [It also happens when I DO use SSL connections].
It just looks like a missing lock call somewhere in the lh code.
Any ideas or fixes?
[1] lh_delete(0x0, 0xe200f8f4, 0xb51f0, 0x28e, 0xe200fa38, 0xdac18), at
0x55738
[2] ERR_remove_state(0x0, 0xcf000, 0x0, 0xb51f0, 0x0, 0x0), at 0x56fc4
[3] 0xef736d3c(0x240140, 0xef585250, 0xef745708, 0xef745728, 0xef74573b,
0xd8), at 0xef736d3b
=>[4] __cPSPThread::~__cPSPThread(this = 0x240140), line 433 in
"cPSPThread.cpp"
[5] __SLIP.DELETER__A(0x240140, 0x1, 0xd9d80, 0x1, 0x1, 0x0), at
0xef76fcc0
[6] __cPSPThread::ThreadMain(this = 0x240140), line 579 in
"cPSPThread.cpp"
[7] ThreadRootStartingPoint(pThreadInstance = 0x240140), line 74 in
"cThread.cpp"
Thanks!
Bill Rebey
-----Original Message-----
From: Bill Rebey
Sent: Tuesday, July 18, 2000 1:26 PM
To: [EMAIL PROTECTED]
Subject: RE: Bus Error
Thanks for the reply. I am already using both of your suggestions. The
locking callbacks appear to be working fine. Here's why I think that.
First, I am cleaning up each thread as it exits by issuing the following
sequence of calls (is this what I SHOULD be doing?):
int iErr = ERR_get_error ();
ERR_error_string (iErr, buf);
ERR_reason_error_string (iErr);
ERR_remove_state (0);
If I remove this group of calls, I can process hundreds of thousands of
connections without a hitch (except for memory leaking). These "hundreds of
thousands" of connections that I mention are across hundreds of thousands of
threads, too. [Each new connection gets a new thread, and my test software
strives to keep 60 of them alive at all times] That all seems to indicate
that the threading and locking are working fine. As soon as I put the
"cleanup" calls back in, though (as mentioned above), I get a crash as noted
in the stack trace and message below.
Any idea what's wrong?
Bill Rebey
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 13, 2000 4:54 PM
To: [EMAIL PROTECTED]; Bill Rebey
Subject: Re: Bus Error
Bill Rebey <[EMAIL PROTECTED]>:
> On a SPARC running Solaris 2.7, my application crashed with a bus error.
>
> Here is the stack trace form the core:
>
> [1] lh_retrieve(0x63150, 0x522c7, 0xf4624, 0x61737465, 0xcbb0588c,
0xf8220),
> at 0x61be0
> [2] ERR_get_state(0xdf000, 0xc5cf8, 0x6523, 0x0, 0xef5d5250, 0x3466), at
> 0x63340
> [3] get_error_values(0x1, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x62b68
[...]
> Significant notes:
>
> This code didn't cause a problem until about the 8,000th call to it (as
you
> can see the code is in a destructor for a Thread class that I wrote;
about
> 8,000 threads were successfully started and stopped before this one
> crashed).
>
> The SSL library was never initialized. Why?? Because in my little world,
> sockets know about SSL things; threads do not. SSL has nothing to with
> threads, as threads can be used for anything. [...]
If you use OpenSSL with multi-threading, you *have* to provide
locking callback functions for synchronizing access to global
(and shared) data structures.
See <URL: http://www.openssl.org/docs/crypto/thread.html>.
Also you should call ERR_remove_state(0) in each thread that is about
to be terminated, but failure to do so should only result in memory
leaks and not in a crash.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]