Hi Fellas, Went back this evening to re-examine the socket and OpenSSL code I had written. I was curious to see if adding a ::sleep() call in between the accept() and SSL_accept() calls made a difference. None whatsoever. ::sleep()ing right before the SSL_accept() makes no difference to NS62.
NS62 and FacilitiesLink hang during handshake if I call accept(), SSL_new(), SSL_set_fd() and SSL_accept in sequence. MSIE, NS475 connect without problems. If I move the SSL_* calls to a worker thread, calls occur further down the production line and NS62 and FacilitiesLink connect without errors. Let me know if you have problems reproducing. As I mentioned yesterday, seems like there's activity on the first (used-up) connection while nothing ever gets sent across the new second connection, causing OpenSSL's ReadSocket() macro to fail during the initial stages of its ssl3_accept() handshake. Let me know how the debugging goes. I'm curious to know why this was occurring and if it was related to persistent connections. Thanks, - Roger On Mon, 4 Mar 2002, Roger Anderson wrote: > Hi Nelson, Bodo, > > Thanks for your reply, Nelson. Hope the message I posted didn't come > across sounding like I'm placing blame. Problems occur all the time. > Looking to fix things, figure out a workaround. Thanks for taking > the time to email back. The very good news is I found a workaround > last night. This was important to me and to the users who are > browsing our server with NS/MZ software. I very much want them to > continue using NS/MZ browsers. > > For help with debugging, here's a description of the situation and > the workaround. Server is threaded. Secure listen socket accept()s, > allocates an SSL structure with SSL_new(), associates it with the > accepted socket with SSL_set_fd(), and calls into SSL_accept() to > negotiate the TLS/SSL handshake with the client, handle the key > exchange, etc. > > First SSL_accept succeeds, server receives and processes the HTTP > request, sends response back. Response is version 1.0 and includes > connection close header, no pipelining yet. Second request from NS62 > hangs forever with blocking sockets, and times out (select) or loops > forever (while) in asynchronous mode. NS47 and MSIE connect without > problems. Same results with FreeBSD, Linux and Win32 builds of the > server software. NS62 across SSL hangs on SSL_accept(). > > Traced through OpenSSL code at length. Second connection never sends > any data. The SSL_accept() call fails in the initial stages of the > second connection on the first read(). Nothing to receive. Nothing > ever makes it across the wire (on the second socket). Hacked some > changes, testing this, tweaking that. When I commented out the > close() code (leaving the first socket open), it seemed like there > was new data moving across the first socket, another client hello. > That's what caused me to follow up with the client resume question. > > I had to get a revision posted for this morning and the fall back > position was to timeout the NS62 clients with non-blocking I/O. Did > that. Here's where it got interesting. In addition to the timeouts > and non-blocking I/O, I relocated the SSL_accept() sequence to a > part of the server code that's further down the production line, > visible to worker threads. There's one listener and dozens of worker > threads. By moving the handshake calls to a worker thread context, > the risk to server if the handshake fails is that it stalls a > single thread, not the main accept loop. Did this too... > > And it worked! NS62 connects successfully. Go figure. The second > connection returns data. Handshake completes. Request is received, > response is generated, connections are closed. What's going on? > > Only thing I can think of is that it's timing-related. I didn't try > sleep()ing the server in between the accept() call and SSL_accept(). > Also wondering whether the HTTP/1.0 lack of connection persistence > had something to do with it. Maybe Mozilla is sending the second > request down the original socket. I downloaded the NSS and Mozilla > source. API and code looks very clean (kudos). Goal was to build, > step in and see what was going on with the client. It would have > been interesting to view both sides of the system and see exactly > what was going on. > > Hope the feedback is helpful. As I said, it really 'bugs' me that I > don't understand why this issue has gone away. I need to sign off on > this thread though. Hardest part of our jobs is figuring out where > to spend our time. The coding is relatively easy. Using our time > wisely is the harder puzzle and I need to get back to application > layer todos with our FacilitiesLink system. > > Finally, just want to express my appreciation for the work you > guys do on the browser side at MZ/NS, and to Bodo et al. with the > OpenSSL libs. Your work makes the web application development we're > doing possible and for that I'm grateful. Keep up the great work. > Let me know if you have any questions about the feedback above. Kind > of curious if connection persistence over SSL had something to do > with it. Very best of luck tracking it down. Enjoy the debugging! > > - Roger > > Roger Anderson, Director (858) 534-0692 > Campus Planning Data and Systems [EMAIL PROTECTED] > University of California, San Diego http://facilities.ucsd.edu > > > On Mon, 4 Mar 2002, Nelson Bolyard wrote: > > > Mr. Anderson: > > > > Your email message was forwarded to me. I am Netscape's chief SSL engineer. > > > > As you may know, Netscape's products use Netscape's own crypto and SSL > > libraries, and do not use an OpenSSL code whatsoever. So, Netscape > > engineers (such as myself) typically do not follow the various OpenSSL > > mailing lists, web sites, and newsgroups. > > > > I develop and maintain code in Netscape's NSS security libraries, the > > libraries used by Netscape products such N6.2 and Netscape's various server > > products to provide SSL and other cryptographic services. > > > > I have worked closely in the past with Dr. Stephen Henson and Bodo Moeller > > to ensure that our respective libraries are and remain interoperable. > > But I have heard nothing about the problems that you report until now. > > > > As I understand your message, you are running a server based on OpenSSL, > > and that server (or server thread) appears to hang or timeout when N6.2 > > attempts to connect to it. You are seeking help with getting the problem > > corrected. > > > > Is that correct? > > > > I am puzzled by the message subject "netscape 6.2 crash" because your > > message does not seem to say that Netscape 6.2 is crashing, but rather > > that OpenSSL is hanging. Do I understand that correctly? > > > > You refer to a set of messages on "openssl-dev" that "provide good > > descriptions of the problem". Could you send me URLs by which I could > > view these messages, or forward copies of them to me? > > > > Thanks. > > > > -- > > Nelson Bolyard Netscape Communications (subsidiary of AOL) > > mailto:[EMAIL PROTECTED] Communicator home page: about:nelsonb > > Disclaimer: I speak for myself, not for Netscape > > ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
