Hi Fellas,

Went back this evening to re-examine the socket and OpenSSL code I
had written. I was curious to see if adding a ::sleep() call in
between the accept() and SSL_accept() calls made a difference.
None whatsoever. ::sleep()ing right before the SSL_accept() makes no
difference to NS62.

NS62 and FacilitiesLink hang during handshake if I call accept(),
SSL_new(), SSL_set_fd() and SSL_accept in sequence. MSIE, NS475
connect without problems. If I move the SSL_* calls to a worker
thread, calls occur further down the production line and NS62 and
FacilitiesLink connect without errors.

Let me know if you have problems reproducing. As I mentioned
yesterday, seems like there's activity on the first (used-up)
connection while nothing ever gets sent across the new second
connection, causing OpenSSL's ReadSocket() macro to fail during
the initial stages of its ssl3_accept() handshake.

Let me know how the debugging goes. I'm curious to know why this
was occurring and if it was related to persistent connections.

Thanks,
- Roger


On Mon, 4 Mar 2002, Roger Anderson wrote:

> Hi Nelson, Bodo,
> 
> Thanks for your reply, Nelson. Hope the message I posted didn't come
> across sounding like I'm placing blame. Problems occur all the time.
> Looking to fix things, figure out a workaround. Thanks for taking
> the time to email back. The very good news is I found a workaround
> last night. This was important to me and to the users who are
> browsing our server with NS/MZ software. I very much want them to
> continue using NS/MZ browsers.
> 
> For help with debugging, here's a description of the situation and
> the workaround. Server is threaded. Secure listen socket accept()s,
> allocates an SSL structure with SSL_new(), associates it with the
> accepted socket with SSL_set_fd(), and calls into SSL_accept() to
> negotiate the TLS/SSL handshake with the client, handle the key
> exchange, etc.
> 
> First SSL_accept succeeds, server receives and processes the HTTP
> request, sends response back. Response is version 1.0 and includes
> connection close header, no pipelining yet. Second request from NS62
> hangs forever with blocking sockets, and times out (select) or loops
> forever (while) in asynchronous mode. NS47 and MSIE connect without
> problems. Same results with FreeBSD, Linux and Win32 builds of the
> server software. NS62 across SSL hangs on SSL_accept().
> 
> Traced through OpenSSL code at length. Second connection never sends
> any data. The SSL_accept() call fails in the initial stages of the
> second connection on the first read(). Nothing to receive. Nothing
> ever makes it across the wire (on the second socket). Hacked some
> changes, testing this, tweaking that. When I commented out the
> close() code (leaving the first socket open), it seemed like there
> was new data moving across the first socket, another client hello.
> That's what caused me to follow up with the client resume question.
> 
> I had to get a revision posted for this morning and the fall back 
> position was to timeout the NS62 clients with non-blocking I/O. Did
> that. Here's where it got interesting. In addition to the timeouts
> and non-blocking I/O, I relocated the SSL_accept() sequence to a
> part of the server code that's further down the production line,
> visible to worker threads. There's one listener and dozens of worker
> threads. By moving the handshake calls to a worker thread context,
> the risk to server if the handshake fails is that it stalls a
> single thread, not the main accept loop. Did this too...
> 
> And it worked! NS62 connects successfully. Go figure. The second
> connection returns data. Handshake completes. Request is received,
> response is generated, connections are closed. What's going on?
> 
> Only thing I can think of is that it's timing-related. I didn't try
> sleep()ing the server in between the accept() call and SSL_accept().
> Also wondering whether the HTTP/1.0 lack of connection persistence
> had something to do with it. Maybe Mozilla is sending the second
> request down the original socket. I downloaded the NSS and Mozilla
> source. API and code looks very clean (kudos). Goal was to build,
> step in and see what was going on with the client. It would have
> been interesting to view both sides of the system and see exactly
> what was going on.
> 
> Hope the feedback is helpful. As I said, it really 'bugs' me that I
> don't understand why this issue has gone away. I need to sign off on
> this thread though. Hardest part of our jobs is figuring out where
> to spend our time. The coding is relatively easy. Using our time
> wisely is the harder puzzle and I need to get back to application
> layer todos with our FacilitiesLink system.
> 
> Finally, just want to express my appreciation for the work you
> guys do on the browser side at MZ/NS, and to Bodo et al. with the
> OpenSSL libs. Your work makes the web application development we're 
> doing possible and for that I'm grateful. Keep up the great work.
> Let me know if you have any questions about the feedback above. Kind
> of curious if connection persistence over SSL had something to do
> with it. Very best of luck tracking it down. Enjoy the debugging!
> 
> - Roger
> 
> Roger Anderson, Director                             (858) 534-0692
> Campus Planning Data and Systems                [EMAIL PROTECTED]
> University of California, San Diego      http://facilities.ucsd.edu
> 
> 
> On Mon, 4 Mar 2002, Nelson Bolyard wrote:
> 
> > Mr. Anderson:
> > 
> > Your email message was forwarded to me.  I am Netscape's chief SSL engineer.
> > 
> > As you may know, Netscape's products use Netscape's own crypto and SSL
> > libraries, and do not use an OpenSSL code whatsoever.  So, Netscape 
> > engineers (such as myself) typically do not follow the various OpenSSL 
> > mailing lists, web sites, and newsgroups.
> > 
> > I develop and maintain code in Netscape's NSS security libraries, the 
> > libraries used by Netscape products such N6.2 and Netscape's various server
> > products to provide SSL and other cryptographic services.  
> > 
> > I have worked closely in the past with Dr. Stephen Henson and Bodo Moeller
> > to ensure that our respective libraries are and remain interoperable.  
> > But I have heard nothing about the problems that you report until now.
> > 
> > As I understand your message, you are running a server based on OpenSSL,
> > and that server (or server thread) appears to hang or timeout when N6.2
> > attempts to connect to it.  You are seeking help with getting the problem
> > corrected.  
> > 
> > Is that correct?
> > 
> > I am puzzled by the message subject "netscape 6.2 crash" because your 
> > message does not seem to say that Netscape 6.2 is crashing, but rather 
> > that OpenSSL is hanging.  Do I understand that correctly?
> > 
> > You refer to a set of messages on "openssl-dev" that "provide good
> > descriptions of the problem".  Could you send me URLs by which I could
> > view these messages, or forward copies of them to me?
> > 
> > Thanks.
> > 
> > --
> > Nelson Bolyard               Netscape Communications (subsidiary of AOL)
> > mailto:[EMAIL PROTECTED]  Communicator home page:  about:nelsonb
> > Disclaimer:                  I speak for myself, not for Netscape
> 
> 

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to