On Mon, 7 Aug 2000, Bill Rebey wrote:
> Date: Mon, 7 Aug 2000 14:25:01 -0400
> From: Bill Rebey <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> To: "Openssl-Dev (E-mail)" <[EMAIL PROTECTED]>
> Subject: Crash bug exemplified
>
> The attached program is about as small as I can make a test app that
> exemplifies the problem that my server application is having. I have posted
> about it repeatedly with no results, probably because nobody can (or wants
> to <g>) reproduce it. This little test program is only about 160 lines long
OK, I have modified the program slightly to build and run on Linux (I use
fairly standard Redhat Linux 6.2 on intel PII 450MHz, openssl 0.9.5a;
The compiler was egcs-2.91.66 (egcs-1.1.2-30 package) )
The diff is attached as well as command line (file linux_cc2.how) I was
using to build the program;
Now I am running two copies of resulting binary, one for over 36 minutes
(total thread count over 4 milion) and another for more than 18 minutes
(22MBytes of output produced so far ), and they just run;
So I suppose there is either some problem with SPARC code (I guess SPARC
Ultra is 64-bit CPU, I have only 32-bit single-CPU machines around), or
with the
build process used by the author of the test program (or with Solaris
libraries), or possibly with program behaviour on multi-CPU machines
(which I cannot test now).
The line used to link by the author has -mt on the very end:
CC -o tst tst.o -L$OPENSSLDIR/lib -lssl -lcrypto -lsocket -mt
Is this corect?
Any ideas?
> with comments. It just tries to keep a bunch of transient threads going at
> once (the threads don't do anything - they just exit after sleeping for a
> millisecond).
>
> <<comp>> <<link>> <<tst.cpp>>
> This problem happens on SPARC Solaris. This program demonstrates the
> problem very quickly (usually within a minute) on both a SPARC Ultra-2 with
> Solaris 2.6, and a SPARC Ultra-60 with Solaris 2.8. My "real" app doesn't
> crash nearly this fast, as it doesn't put nearly the stress-test on OpenSSL
> that the test app does - but it most certainly crashes every time I test it;
> it just takes hours instead of seconds.
>
> Can anyone reproduce this and fix it? I'm in a VERY bad spot here because I
> can't ship my product until I get OpenSSL to work. My company pretty much
> threw sand in RSA's face in favor of using OpenSSL, on my recommendation,
> and now I can't make OpenSSL work and we can't ship my product. This is
> hardly a great career move for me. If anyone can identify and fix this bug,
> I would greatly appreciate it. I look pretty stupid right now to the folks
> in upper management, and I feel like my hands are tied. I'm trying to use
> Purify to determine the problem, but I've never used it before and will
> probably be slow to figure out how to make it work and understand exactly
> what it's telling me.
>
> If anyone sees any obvious misuse problem, PLEASE let me know. I would LOVE
> to hear "you're doing it wrong - you forgot to make this function call!" and
> be done with it, but as far as I can tell, I'm obeying the OpenSSL usage
> laws to the letter.
>
> If you run the "comp" and "link" scripts to build this little test program,
> then run the resultant "tst" executable, it should crash after a short time
> and if you run dbx against the resultant core, you should get the following
> stack in response to the dbx "where" command:
>
> core file header read successfully
> Reading ld.so.1
> Reading libsocket.so.1
> Reading libCrun.so.1
> Reading libm.so.1
> Reading libw.so.1
> Reading libthread.so.1
> Reading libc.so.1
> Reading libnsl.so.1
> Reading libdl.so.1
> Reading libmp.so.2
> Reading libc_psr.so.1
> detected a multithreaded program
> t@3937 (l@48) terminated by signal BUS (invalid address alignment)
> Current function is ThreadMain
> 100 int iErr = ERR_get_error ();
> (/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) where
> current thread: t@3937
> [1] t_delete(0x9, 0xff2b6000, 0x150, 0x65300, 0x651a8, 0x150), at
> 0xff241798
> [2] realfree(0x9, 0xff2bc7b0, 0xff2b6000, 0x65300, 0x153, 0x65308), at
> 0xff241420
> [3] cleanfree(0x0, 0xff2b6000, 0xff2bc724, 0xff2bc7a4, 0xff2bc730, 0x0),
> at 0xff241cb4
> [4] _malloc_unlocked(0x60, 0x0, 0xff2b6000, 0x60, 0x5, 0x0), at 0xff240e20
> [5] malloc(0x60, 0x60, 0x62798, 0x150, 0x0, 0x0), at 0xff240d3c
> [6] CRYPTO_malloc(0x5a5b0, 0x470d0, 0x77, 0x5a400, 0x470d0, 0x60), at
> 0x17070
> [7] lh_new(0x1cba0, 0x1cbb8, 0x470d0, 0x2be, 0x1cbb8, 0x14c), at 0x34604
> [8] ERR_get_state(0x5a400, 0x0, 0x673e0, 0x430d8, 0x673e0, 0xf7509b28), at
> 0x1ce6c
> [9] get_error_values(0x1, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x1c4a0
> =>[10] ThreadMain(pNothing = (nil)), line 100 in "tst.cpp"
> (/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) quit
>
> Thanks for your help,
>
> Bill Rebey
>
>
>
>
Regards,
Wojtek
comp
link
tst.cpp
g++ -Wall -I/usr/local/ssl/include -L/usr/local/ssl/lib -g -D_REENTRANT -DNO_RSA
-DNO_RC4 -DNO_RC5 -DNO_BF -DNO_IDEA -fstack-check -o tst2 tst2.cc -lssl -lpthread
-lcrypto
--- tst.cpp Wed Aug 9 12:50:06 2000
+++ tst2.cc Wed Aug 9 13:36:57 2000
@@ -16,6 +16,7 @@
\*==============================================================================================*/
#include <pthread.h>
+#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
@@ -143,8 +144,9 @@
++_iThreadCnt;
++_iTotThreads;
_cCritSec.Leave ();
+ pthread_t threadID;
- if (pthread_create (NULL, &_threadAttr, ThreadMain, NULL))
+ if (pthread_create (&threadID, &_threadAttr, ThreadMain,
+NULL))
{
printf ("\nERROR CREATING THREAD!\n");
}
@@ -159,3 +161,5 @@
}
}
}
+
+// vim:sw=4 ts=4