I'm curious if anyone has seen anything like this before.

We have a situation at one customer site. They see it happen every few days. No 
one else has reported it, and we can't reproduce it.

There's a Linux server, listening on multiple ports, handling lots of 
conversations (multiplexed with poll). Various protocols, some TLS, others not. 
Clients from many remote systems connect to this server. Some conversations are 
short-lived, others long-lived.

Four of the ports are handling Telnet (TN3270) traffic, over TLS.

Sometimes one of the ports stops responding to new conversations, from the 
client's point of view. Other clients continue to connect to other ports owned 
by the same server process; established conversations continue to work. After a 
while (maybe 15 minutes or so), the problem goes away. Note, again, that this 
only applies to new conversations on this one port. Everything else in the same 
process is happy.

A wire trace taken while the problem is occuring shows:

1. Client sends ClientHello; server stack ACKs it immediately.
2. A minute passes with no activity on the conversation.
3. Client gives up - we get a FIN from it. Server stack ACKs the FIN 
immediately.
4. Almost a minute and a half later (89 seconds in the case I'm looking at), 
the server happily sends the ServerHello. Well, that's a bit too late, and 
there's the usual crying and recriminations (RSTs from the client stack).

So nearly 2.5 minutes between ClientHello being received by the server 
machine's stack, and the ServerHello appearing on the wire. We know there's 
nothing generally wrong with the network or machines, and the processes in 
question are otherwise behaving normally.

ServerHello shows the server chose TLS_RSA_WITH_AES_256_CBC_SHA (TLS/1.0), so 
there's nothing screwy like computing DH parameters happening behind the 
covers. It's too early in the process for certificate validation callbacks to 
be invoked. Or for nearly anything else to be happening. All the server has is 
the ClientHello.

One thing I don't have at this point is any tracepoints I can have the customer 
enable to see if, say, we're getting a lot of SSL_WANT_READ or SSL_WANT_WRITE 
from SSL_accept. The socket should be in blocking mode, though it's possible 
there's some bug there.

The logic here is not exotic. It's along the lines of:
        desc = accept(master, ...);
        ssl = SSL_new(ctx);
        SSL_set_fd(ssl, desc);
        SSL_accept(ssl);

There's some setting of socket options like SO_KEEPALIVE and ex_data so we can 
recover our info in the callbacks, but really it's all pretty standard.

Any ideas?

--
Michael Wojcik
Technology Specialist, Micro Focus


Please consider the environment before printing this e-mail.
_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users

Reply via email to