This is from yesterday. Apparently I only sent this to Chris rather than
the list, as I didn't see it on the List. Apologies if you've already seen
it. It's relevant to my posting of a few minutes ago.
--------------------
The problem appears to be (from visual inspection - I didn't try to watch
thread_hash in a debugger) that ERR_remove_state (crypto/err/err.c)
interrogates thread_hash without a Lock. I'm guessing that at the top of
the function, thread_hash is non-null, but since the function isn't Locking
its access to thread_hash, by the time it gets around to calling lh_delete,
thread_hash is, in fact, NULL, and we blow up in lh_delete.
Here's the original code, with a proposed fix below:
void ERR_remove_state(unsigned long pid)
{
ERR_STATE *p,tmp;
if (thread_hash == NULL)
return;
if (pid == 0)
pid=(unsigned long)CRYPTO_thread_id();
tmp.pid=pid;
CRYPTO_w_lock(CRYPTO_LOCK_ERR);
p=(ERR_STATE *)lh_delete(thread_hash,&tmp);
if (lh_num_items(thread_hash) == 0)
{
/* make sure we don't leak memory */
lh_free(thread_hash);
thread_hash = NULL;
}
CRYPTO_w_unlock(CRYPTO_LOCK_ERR);
if (p != NULL) ERR_STATE_free(p);
}
I would propose this change:
void ERR_remove_state(unsigned long pid)
{
ERR_STATE *p,tmp;
CRYPTO_w_lock(CRYPTO_LOCK_ERR);
if (thread_hash)
{
if (pid == 0)
pid=(unsigned long)CRYPTO_thread_id();
tmp.pid=pid;
p=(ERR_STATE *)lh_delete(thread_hash,&tmp);
if (lh_num_items(thread_hash) == 0)
{
/* make sure we don't leak memory */
lh_free(thread_hash);
thread_hash = NULL;
}
}
CRYPTO_w_unlock(CRYPTO_LOCK_ERR);
if (p != NULL) ERR_STATE_free(p);
}
Does anyone agree or disagree that the first version is wrong, and is the
source of my crashes, and that the second version solves the problem?
Bill Rebey
-----Original Message-----
From: Chris Zimman [mailto:[EMAIL PROTECTED]]
Sent: Monday, July 24, 2000 12:34 PM
To: Bill Rebey
Subject: Re: Bus Errors
>
> int iErr = ERR_get_error ();
> ERR_error_string (iErr, buf);
> ERR_reason_error_string (iErr);
> ERR_remove_state (0);
>
> The call to ERR_remove_state (0) is the one that sporadically causes bus
> errors. I am using locks and locking callbacks, and they all seem to be
> working just fine. If I take out the ERR_remove_state (0); call, things
> appear to work just fine and my program never crashes.
> [1] lh_delete(0x0, 0xe200f8f4, 0xb51f0, 0x28e, 0xe200fa38, 0xdac18), at
> 0x55738
> [2] ERR_remove_state(0x0, 0xcf000, 0x0, 0xb51f0, 0x0, 0x0), at 0x56fc4
As is indicated by your stack trace, lh_delete() is being passed a NULL
pointer which would cause it to crash almost immediately after going into
lh_delete(), since there's an access via it:
crypto/lhash/lhash.c:223
lh->error = 0;
What's interesting is that thread_hash is a static global in
crypto/err/err.c -- I'm missing where this is getting set to NULL as
there's a check for this in ERR_remove_state() and there seem to be
appropriate locks around all functions that modify this.
Can you set a watch to see where thread_hash is being set to NULL? This
would probably provide some further insight.
--Chris
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]