Nauman Akbar wrote:
Dear Users

I am having this problem for a long time. Initially I thought it was an issue with configuration of multi-threading but the problem seems to remain with multi-threading removed.

I have developed a simple ssl based multi-threaded server application. Previously, openssl data was shared among threads but now all ssl functions are performed in a single thread. I am developing this application on RH9 using openssl 0.9.7a. There is only one client connecting to this server using the same credentials. Both client and server only use ADH with SSLv3.

The problem I am having is, sometimes SSL_accept fails completely randomly, taking down the server with it. It may be a segmentation fault or some other exception. Since I am connecting to the machine remotely, it is not possible for me to monitor the application at all times (although I have tried). This is why I don’t know for certain what error is generated when the server application crashes.

One thing is always common. The server terminates while doing a new SSL_accept. The client receives this error on the other side: 21298:error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac:s3_pkt.c:1052:SSL alert number 20.

Even the more bizarre thing is sometimes it would handle <500 connections, sometimes <1000. There had been few cases of 1000-5000 requests. Last week it crashed after about 2-3 weeks time with request count in excess of 11000. It crashed again yesterday after running for less than 24 hours and handling 40000 requests. It crashed again today within 24 hours with 700 requests. After every crash, I changed different multi-threading options (both generic and openssl based) to make it work. However, during last 2 runs no ssl based functions/data are shared among threads. So it is not a case of multi threading failing or any race condition causing the crash. Additionally, the application is explicitly made to keep thread count under 10 so it can’t be an issue of memory unavailability. The server program is quite linear and do not use dynamic blocks of memory except for certain class/structure objects (but no arrays etc), so index over running or anything similar is also not plausible. Just for sanity check, I am also having my code reviewed by others.

The situation has become very urgent as I have to deliver this by coming Friday and I still don’t know what is causing this. The only plausible option I am left with is 0.9.7a has some issues with SSL_accept. I am trying to get new version installed on the system. In the meantime, can anyone guide me with respect to this problem? Is this really a version issue or is there anything else I need to look at?

Regards

Nauman Akbar

Concise Solutions



Nauman -

We have been battling this exact same situation for the last three weeks, to no avail. We're regretfully considering other options such as GnuTLS which looks promising, although this may not be an option for you.

I wish we could have worked out the issues with OpenSSL. Perhaps it's our coding that is messing up, but we are unable to get any help with our problems. It seems as if this error is quite common, and many people have had it, yet not many can explain it, and even less know how it's fixed. We've tried compiling and linking our app against OpenSSL-0.9.5 through OpenSSL-0.9.7g, with almost the exact same error. I am telling you this to save you the time.

I'm not sure that this helps, but at least we understand what you're going through :)

Thanks
-dant
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to