Context: openssl0.9.8g, 0.9.8m, 0.9.8n, and openssl0.9.8o.  Running on
Windows Vista, compiled with MinGW on Windows XP.

Problem: While using non-blocking I/O, a *second* call to
BIO_do_connect() always returns non-zero even if the underlying
connection has not yet been established.  A subsequent SSL_connect()
fails with SSL_ERROR_SYSCALL.  (A subsequent WSAGetLastError() returns
10057 == WSAENOTCONN == Socket is not connected.)  Using
BIO_should_retry() does not circumvent the problem; it is equally
misleading as BIO_do_connect().

Here's a rough outline of code to trigger the bug.

    BIO *_bio = BIO_new(BIO_s_connect());
    BIO_set_nbio(_bio, 1);
    BIO_set_conn_hostname(_bio, "foo.bar.baz:123"); // some hostname:port

    while(1) {
      int r = BIO_do_connect(_bio);

      // r == 0 in first iteration
      // r > 0 on second iteration (bug!)

      if (r > 0) {
        // connected
        break;
      }

      Sleep(1); // ie, sleep for one *millisecond*
    }

On the first iteration, r == 0: the code sleeps for 1ms.  On the second
iteration, r > 0, always, regardless of whether the connection has
actually been established.  (A packet trace showed that the SYN/ACK
hadn't even been received from the server yet.)

Microsoft's connect() documentation actually hints at this problem;
see [http://bit.ly/bO43No]:

    Until the connection attempt completes on a nonblocking socket, all
    subsequent calls to connect on the same socket will fail with the
    error code WSAEALREADY, and WSAEISCONN when the connection completes
    successfully. Due to ambiguities in version 1.1 of the Windows
    Sockets specification, error codes returned from connect while
    a connection is already pending may vary among implementations. As
    a result, it is not recommended that applications use multiple calls
    to connect to detect connection completion. If they do, they must be
    prepared to handle WSAEINVAL and WSAEWOULDBLOCK error values the
    same way that they handle WSAEALREADY, to assure robust operation.

Increasing the Sleep() call to some number greater than the time
required to establish the connection works around the problem.  For
example, using a conservative 500ms works in most cases.  Needless to
say, this is an undesirable way to address this issue.

A cleaner work-around is calling select() on the underlying file
descriptor to wait for writability, as suggested by Microsoft in the
above article.  Here's a code outline that does that.

    BIO *_bio = BIO_new(BIO_s_connect());
    BIO_set_nbio(_bio, 1);
    BIO_set_conn_hostname(_bio, "foo.bar.baz:123"); // some hostname:port

    int r = BIO_do_connect(_bio);

    if (r > 0) {
      // connection established
      break;
    }

    int fd = BIO_get_fd(_bio, NULL);
    fd_set wfd;
    FD_ZERO(&wfd);
    FD_SET((unsigned) fd, &wfd);
    timeval tv;
    tv.tv_sec = 5;
    tv.tv_usec = 0;

    // first select() argument is ignored on windows
    int rval = select(1, NULL, &wfd, NULL, &tv);

    // time limit expired
    if (rval == 0)
        break;

    // select() error, see WSAGetLastError() for more info
    if (rval == SOCKET_ERROR)
        break;

    // oddly, fd isn't ready for writing, shouldn't happen
    if (!FD_ISSET(fd, &wfd))
        break;


Kind regards,

Thomer Gil

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to