Re: SSL_connect and SSL_accept deadlock!

2010-11-07 Thread David Schwartz


	This may be a stretch, but did you confirm the socket is within the 
range of sockets your platform allows you to 'select' on? For example, 
Linux by default doesn't permit you to 'select' on socket numbers 1,025 
and up, though you can have more than 1,024 file descriptors in use 
without a problem.


DS

__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: SSL_connect and SSL_accept deadlock!

2010-11-03 Thread David Schwartz

On 11/2/2010 6:25 PM, Md Lazreg wrote:


 r=select(m_sock_fd + 1, fds, 0, 0, ptv);
 if (r = 0  (Errno == EAGAIN || Errno == EINTR))/*if we timed
out with EAGAIN try again*/
 {
 r = 1;
 }


This code is broken. If 'select' returns zero, checking errno is a 
mistake. (What is 'Errno' anyway?)



   r = SSL_connect(m_ssl);
   if (r  0)
   {
  break;
   }
   r = ssl_retry(r);
   if ( r = 0)
   {
  break;
   }
   t = time(NULL) - time0;
}


Err, what? Is an ssl_retry return of zero supposed to indicate a fatal 
error? The code in ssl_retry doesn't seem to follow this rule. (For 
example, consider if 'select' returns zero and errno is zero. That would 
indicate a timeout, not a fatal error.)



int time0 = time(NULL);
timeout=10 seconds;
while (ttimeout)
{
   r = SSL_accept(m_ssl);
   if (r  0)
   {
  break;
   }
   r = ssl_retry(r);
   if ( r = 0)
   {
  break;
   }
   t = time(NULL) - time0;
}
if (t=timeout)


There no code to initially set 't'.

Also, an overall comment: Maybe it's just my taste, but your code seems 
to have a 'worst of both worlds' quality to it. It uses non-blocking 
sockets, but then finds clever ways to make the non-blocking operations 
act like blocking ones.


Is the server multithreaded? If so, I could see this as mere laziness 
(or, efficient use of coding resources to be more charitable) rather 
than actual poor design.


DS

__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: SSL_connect and SSL_accept deadlock!

2010-11-03 Thread Jeffrey Walton
On Wed, Nov 3, 2010 at 9:12 AM, David Schwartz dav...@webmaster.com wrote:
 On 11/2/2010 6:25 PM, Md Lazreg wrote:

         r=select(m_sock_fd + 1, fds, 0, 0, ptv);
         if (r = 0  (Errno == EAGAIN || Errno == EINTR))/*if we timed
 out with EAGAIN try again*/
         {
             r = 1;
         }

 This code is broken. If 'select' returns zero, checking errno is a mistake.
 (What is 'Errno' anyway?)

 [SNIP]

 Is the server multithreaded? If so, I could see this as mere laziness (or,
 efficient use of coding resources to be more charitable) rather than actual
 poor design.
lol
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


SSL_connect and SSL_accept deadlock!

2010-11-02 Thread Md Lazreg
I have an SSL client that connects to an SSL server. The server is able to
process 1000s of clients just fine on a variety of platforms
[Window/Linux/HP/Solairs] for long periods of time.

The problem that is driving me nuts is that from time to time like once
every 24 hours some client fails to connect to the server at the handshaking
phase. This happens only on Linux and HP. Other platforms do not experience
this issue.

Here is a sketch of my client and server code. Please note that I am using non
blocking sockets:

common code:
-
int ssl_retry(int ret)
{
   int r;
   fd_set fds;
   struct timeval tv, *ptv=0;
   tv.tv_sec =  1;/*do a select for 1 second each time*/
   tv.tv_usec = 0;
   ptv=tv;
   FD_ZERO(fds);

   switch(SSL_get_error(m_ssl, ret)
   {
case SSL_ERROR_NONE:
 r = 1;
 break;
case SSL_ERROR_WANT_READ:
FD_SET(m_sock_fd, fds);
r=select(m_sock_fd + 1, fds, 0, 0, ptv);
if (r = 0  (Errno == EAGAIN || Errno == EINTR))/*if we timed out
with EAGAIN try again*/
{
r = 1;
}
break;
case SSL_ERROR_WANT_WRITE:/
FD_SET(m_sock_fd, fds);
r=select(m_sock_fd + 1, 0, fds, 0, ptv);
if (r = 0  (Errno == EAGAIN || Errno == EINTR))/*if we timed out
with EAGAIN try again*/
{
r = 1;
}
break;
case SSL_ERROR_ZERO_RETURN:/*The socket closed*/
r = 0;
break;
case SSL_ERROR_SYSCALL:
case SSL_ERROR_SSL:
 r = -1;
 break;
default:
r = -1;
}
return r;

client code:
-
int time0 = time(NULL);
timeout=10 seconds;
while (ttimeout)
{
  r = SSL_connect(m_ssl);
  if (r  0)
  {
 break;
  }
  r = ssl_retry(r);
  if ( r = 0)
  {
 break;
  }
  t = time(NULL) - time0;
}
if (t=timeout)
{
  I timed out:(
}
if (r0)
{
  We are connected. Do work.
}
else
{
  Some kind of an issue.
}

server code:
-
int time0 = time(NULL);
timeout=10 seconds;
while (ttimeout)
{
  r = SSL_accept(m_ssl);
  if (r  0)
  {
 break;
  }
  r = ssl_retry(r);
  if ( r = 0)
  {
 break;
  }
  t = time(NULL) - time0;
}
if (t=timeout)
{
  I timed out:(
}
if (r0)
{
  We are connected. Do work.
}
else
{
  Some kind of an issue.
}


When this problem happens both the client and the server end up in the red
line above I timed out

With some debugging efforts I see that when this problem hits, both the
client and the server go repeatedly into the green section above, each one
of them seems to want to perform a read as the returned code
is SSL_ERROR_WANT_READ from both the SSL_connect and the SSL_accept calls.

This looks to me as a deadlock situation where both my server and my client
are wanting to do a READ until both of them timeout!

Can someone please suggest to me what is wrong with the above code and how
is this deadlock possible?? I am using openssl-1.0.0a