Re: SSL_connect and SSL_accept deadlock!
This may be a stretch, but did you confirm the socket is within the range of sockets your platform allows you to 'select' on? For example, Linux by default doesn't permit you to 'select' on socket numbers 1,025 and up, though you can have more than 1,024 file descriptors in use without a problem. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: SSL_connect and SSL_accept deadlock!
On 11/2/2010 6:25 PM, Md Lazreg wrote: r=select(m_sock_fd + 1, fds, 0, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } This code is broken. If 'select' returns zero, checking errno is a mistake. (What is 'Errno' anyway?) r = SSL_connect(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } Err, what? Is an ssl_retry return of zero supposed to indicate a fatal error? The code in ssl_retry doesn't seem to follow this rule. (For example, consider if 'select' returns zero and errno is zero. That would indicate a timeout, not a fatal error.) int time0 = time(NULL); timeout=10 seconds; while (ttimeout) { r = SSL_accept(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } if (t=timeout) There no code to initially set 't'. Also, an overall comment: Maybe it's just my taste, but your code seems to have a 'worst of both worlds' quality to it. It uses non-blocking sockets, but then finds clever ways to make the non-blocking operations act like blocking ones. Is the server multithreaded? If so, I could see this as mere laziness (or, efficient use of coding resources to be more charitable) rather than actual poor design. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: SSL_connect and SSL_accept deadlock!
On Wed, Nov 3, 2010 at 9:12 AM, David Schwartz dav...@webmaster.com wrote: On 11/2/2010 6:25 PM, Md Lazreg wrote: r=select(m_sock_fd + 1, fds, 0, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } This code is broken. If 'select' returns zero, checking errno is a mistake. (What is 'Errno' anyway?) [SNIP] Is the server multithreaded? If so, I could see this as mere laziness (or, efficient use of coding resources to be more charitable) rather than actual poor design. lol __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
SSL_connect and SSL_accept deadlock!
I have an SSL client that connects to an SSL server. The server is able to process 1000s of clients just fine on a variety of platforms [Window/Linux/HP/Solairs] for long periods of time. The problem that is driving me nuts is that from time to time like once every 24 hours some client fails to connect to the server at the handshaking phase. This happens only on Linux and HP. Other platforms do not experience this issue. Here is a sketch of my client and server code. Please note that I am using non blocking sockets: common code: - int ssl_retry(int ret) { int r; fd_set fds; struct timeval tv, *ptv=0; tv.tv_sec = 1;/*do a select for 1 second each time*/ tv.tv_usec = 0; ptv=tv; FD_ZERO(fds); switch(SSL_get_error(m_ssl, ret) { case SSL_ERROR_NONE: r = 1; break; case SSL_ERROR_WANT_READ: FD_SET(m_sock_fd, fds); r=select(m_sock_fd + 1, fds, 0, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } break; case SSL_ERROR_WANT_WRITE:/ FD_SET(m_sock_fd, fds); r=select(m_sock_fd + 1, 0, fds, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } break; case SSL_ERROR_ZERO_RETURN:/*The socket closed*/ r = 0; break; case SSL_ERROR_SYSCALL: case SSL_ERROR_SSL: r = -1; break; default: r = -1; } return r; client code: - int time0 = time(NULL); timeout=10 seconds; while (ttimeout) { r = SSL_connect(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } if (t=timeout) { I timed out:( } if (r0) { We are connected. Do work. } else { Some kind of an issue. } server code: - int time0 = time(NULL); timeout=10 seconds; while (ttimeout) { r = SSL_accept(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } if (t=timeout) { I timed out:( } if (r0) { We are connected. Do work. } else { Some kind of an issue. } When this problem happens both the client and the server end up in the red line above I timed out With some debugging efforts I see that when this problem hits, both the client and the server go repeatedly into the green section above, each one of them seems to want to perform a read as the returned code is SSL_ERROR_WANT_READ from both the SSL_connect and the SSL_accept calls. This looks to me as a deadlock situation where both my server and my client are wanting to do a READ until both of them timeout! Can someone please suggest to me what is wrong with the above code and how is this deadlock possible?? I am using openssl-1.0.0a