Dear Users

 

I am having this problem for a long time. Initially I thought it was an issue with configuration of multi-threading but the problem seems to remain with multi-threading removed.

 

I have developed a simple ssl based multi-threaded server application. Previously, openssl data was shared among threads but now all ssl functions are performed in a single thread. I am developing this application on RH9 using openssl 0.9.7a. There is only one client connecting to this server using the same credentials. Both client and server only use ADH with SSLv3.

 

The problem I am having is, sometimes SSL_accept fails completely randomly, taking down the server with it. It may be a segmentation fault or some other exception. Since I am connecting to the machine remotely, it is not possible for me to monitor the application at all times (although I have tried). This is why I don’t know for certain what error is generated when the server application crashes.

 

One thing is always common. The server terminates while doing a new SSL_accept. The client receives this error on the other side: 21298:error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac:s3_pkt.c:1052:SSL alert number 20.

 

Even the more bizarre thing is sometimes it would handle <500 connections, sometimes <1000. There had been few cases of 1000-5000 requests. Last week it crashed after about 2-3 weeks time with request count in excess of 11000. It crashed again yesterday after running for less than 24 hours and handling 40000 requests. It crashed again today within 24 hours with 700 requests. After every crash, I changed different multi-threading options (both generic and openssl based) to make it work. However, during last 2 runs no ssl based functions/data are shared among threads. So it is not a case of multi threading failing or any race condition causing the crash. Additionally, the application is explicitly made to keep thread count under 10 so it can’t be an issue of memory unavailability. The server program is quite linear and do not use dynamic blocks of memory except for certain class/structure objects (but no arrays etc), so index over running or anything similar is also not plausible. Just for sanity check, I am also having my code reviewed by others.

 

The situation has become very urgent as I have to deliver this by coming Friday and I still don’t know what is causing this. The only plausible option I am left with is 0.9.7a has some issues with SSL_accept. I am trying to get new version installed on the system. In the meantime, can anyone guide me with respect to this problem? Is this really a version issue or is there anything else I need to look at?

 

Regards

Nauman Akbar

Concise Solutions

Reply via email to