Context: openssl0.9.8g, 0.9.8m, 0.9.8n, and openssl0.9.8o. Running on
Windows Vista, compiled with MinGW on Windows XP.
Problem: While using non-blocking I/O, a *second* call to
BIO_do_connect() always returns non-zero even if the underlying
connection has not yet been established. A subsequent SSL_connect()
fails with SSL_ERROR_SYSCALL. (A subsequent WSAGetLastError() returns
10057 == WSAENOTCONN == Socket is not connected.) Using
BIO_should_retry() does not circumvent the problem; it is equally
misleading as BIO_do_connect().
Here's a rough outline of code to trigger the bug.
BIO *_bio = BIO_new(BIO_s_connect());
BIO_set_nbio(_bio, 1);
BIO_set_conn_hostname(_bio, "foo.bar.baz:123"); // some hostname:port
while(1) {
int r = BIO_do_connect(_bio);
// r == 0 in first iteration
// r > 0 on second iteration (bug!)
if (r > 0) {
// connected
break;
}
Sleep(1); // ie, sleep for one *millisecond*
}
On the first iteration, r == 0: the code sleeps for 1ms. On the second
iteration, r > 0, always, regardless of whether the connection has
actually been established. (A packet trace showed that the SYN/ACK
hadn't even been received from the server yet.)
Microsoft's connect() documentation actually hints at this problem;
see [http://bit.ly/bO43No]:
Until the connection attempt completes on a nonblocking socket, all
subsequent calls to connect on the same socket will fail with the
error code WSAEALREADY, and WSAEISCONN when the connection completes
successfully. Due to ambiguities in version 1.1 of the Windows
Sockets specification, error codes returned from connect while
a connection is already pending may vary among implementations. As
a result, it is not recommended that applications use multiple calls
to connect to detect connection completion. If they do, they must be
prepared to handle WSAEINVAL and WSAEWOULDBLOCK error values the
same way that they handle WSAEALREADY, to assure robust operation.
Increasing the Sleep() call to some number greater than the time
required to establish the connection works around the problem. For
example, using a conservative 500ms works in most cases. Needless to
say, this is an undesirable way to address this issue.
A cleaner work-around is calling select() on the underlying file
descriptor to wait for writability, as suggested by Microsoft in the
above article. Here's a code outline that does that.
BIO *_bio = BIO_new(BIO_s_connect());
BIO_set_nbio(_bio, 1);
BIO_set_conn_hostname(_bio, "foo.bar.baz:123"); // some hostname:port
int r = BIO_do_connect(_bio);
if (r > 0) {
// connection established
break;
}
int fd = BIO_get_fd(_bio, NULL);
fd_set wfd;
FD_ZERO(&wfd);
FD_SET((unsigned) fd, &wfd);
timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;
// first select() argument is ignored on windows
int rval = select(1, NULL, &wfd, NULL, &tv);
// time limit expired
if (rval == 0)
break;
// select() error, see WSAGetLastError() for more info
if (rval == SOCKET_ERROR)
break;
// oddly, fd isn't ready for writing, shouldn't happen
if (!FD_ISSET(fd, &wfd))
break;
Kind regards,
Thomer Gil
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]