SSL_connect and SSL_accept deadlock!
I have an SSL client that connects to an SSL server. The server is able to process 1000s of clients just fine on a variety of platforms [Window/Linux/HP/Solairs] for long periods of time. The problem that is driving me nuts is that from time to time like once every 24 hours some client fails to connect to the server at the handshaking phase. This happens only on Linux and HP. Other platforms do not experience this issue. Here is a sketch of my client and server code. Please note that I am using non blocking sockets: common code: - int ssl_retry(int ret) { int r; fd_set fds; struct timeval tv, *ptv=0; tv.tv_sec = 1;/*do a select for 1 second each time*/ tv.tv_usec = 0; ptv=tv; FD_ZERO(fds); switch(SSL_get_error(m_ssl, ret) { case SSL_ERROR_NONE: r = 1; break; case SSL_ERROR_WANT_READ: FD_SET(m_sock_fd, fds); r=select(m_sock_fd + 1, fds, 0, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } break; case SSL_ERROR_WANT_WRITE:/ FD_SET(m_sock_fd, fds); r=select(m_sock_fd + 1, 0, fds, 0, ptv); if (r = 0 (Errno == EAGAIN || Errno == EINTR))/*if we timed out with EAGAIN try again*/ { r = 1; } break; case SSL_ERROR_ZERO_RETURN:/*The socket closed*/ r = 0; break; case SSL_ERROR_SYSCALL: case SSL_ERROR_SSL: r = -1; break; default: r = -1; } return r; client code: - int time0 = time(NULL); timeout=10 seconds; while (ttimeout) { r = SSL_connect(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } if (t=timeout) { I timed out:( } if (r0) { We are connected. Do work. } else { Some kind of an issue. } server code: - int time0 = time(NULL); timeout=10 seconds; while (ttimeout) { r = SSL_accept(m_ssl); if (r 0) { break; } r = ssl_retry(r); if ( r = 0) { break; } t = time(NULL) - time0; } if (t=timeout) { I timed out:( } if (r0) { We are connected. Do work. } else { Some kind of an issue. } When this problem happens both the client and the server end up in the red line above I timed out With some debugging efforts I see that when this problem hits, both the client and the server go repeatedly into the green section above, each one of them seems to want to perform a read as the returned code is SSL_ERROR_WANT_READ from both the SSL_connect and the SSL_accept calls. This looks to me as a deadlock situation where both my server and my client are wanting to do a READ until both of them timeout! Can someone please suggest to me what is wrong with the above code and how is this deadlock possible?? I am using openssl-1.0.0a
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
Here is how my ErrorSet is constructed : fd_set WriteSet; FD_ZERO(WriteSet); FD_SET(m_sock_fd, WriteSet); fd_set ErrorSet; FD_ZERO(ErrorSet ); FD_SET(m_sock_fd, ErrorSet ); status = select(m_sock_fd+1, NULL, WriteSet, ErrorSet , tv); if (FD_ISSET(m_sock_fd,WriteSet)) { cout Socket in the write set endl flush; } if (FD_ISSET(m_sock_fd,ErrorSet)) { cout Socket in the error set endl flush; } I am not saying that that is how it should behave. I am saying that this is how it is behaving. With the above code and when my server is down, my Windows client will output Socket in the error set only. Which means that the socket was put in the ErrorSet and was not put in the WriteSet. Under the same conditions and using the same code, the UNIX client will put the socket in the WriteSet. Of course I do not know why Windows behaves this way, do you know? Thanks On Tue, Aug 25, 2009 at 6:42 PM, David Schwartz dav...@webmaster.comwrote: Md Lazreg wrote: I do not know why you think my new change allows me to detect soft failures. The only change I made is to change this: status = select(m_sock_fd+1, NULL, WriteSet, NULL, tv); to this: status = select(m_sock_fd+1, NULL, WriteSet, ErrorSet, tv); Are you saying that for a soft failure, Windows will still put the socket in the ErrorSet? How is your 'ErrorSet' constructed? And you're asking the wrong question. The question you should ask yourself is -- how can there be a hard error yet the socket not yet be ready for writing? What could I possibly still be waiting for? DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
I do not know why you think my new change allows me to detect soft failures. The only change I made is to change this: status = select(m_sock_fd+1, NULL, WriteSet, NULL, tv); to this: status = select(m_sock_fd+1, NULL, WriteSet, ErrorSet, tv); Are you saying that for a soft failure, Windows will still put the socket in the ErrorSet? Thanks On Mon, Aug 24, 2009 at 8:14 PM, David Schwartz dav...@webmaster.comwrote: Md Lazreg wrote: It is possible that the previous Windows behavior is correct but that is not the behavior I want. I think you are incorrect about that. I want the same behavior as UNIX which in my opinion is what my clients would want. My clients can connect to a set of servers in a raw, if one is not available for whatever reason I want them to move to the next one instead of having to wait the whole timeout before trying the next server. I agree. But that's not what your code does now. What your code does is stops trying the first server. What you want it to do is start trying the second server. Here's probably what you want: 1) Start trying to connect to the first server. 2) Wait a short amount of time to see if we have a connection. 3) If we have a connection, we are done. We succeed. 4) If we don't have a connection, add another attempt to another server, if possible. 5) If all connection possibilities have failed, stop. We fail. 6) Go to step 2. Note that this does not require the change you made, which allows you to detect soft failures. If you get a soft failure, there is no reason to abort the attempt -- it still might succeed. And why would you want to wait 60 seconds or so if a server is not responding at all if you have another server you could try? Thanks for your help. You're welcome. I'm glad you got it working the way you think you want it. But I don't think it's working the way you should want it. There is no rush to abort a connection attempt that might ultimately succeed, no matter how unlikely. Just don't wait for it -- keep going, and if it fails, no loss. If it succeeds later, you still win. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
Thanks all. Both UNIX and Windows will return EINPROGRESS and WSAEWOULDBLOCK successively after a non blocking connect. [I was confused about this before but now I understand it.] The difference between UNIX and Windows is in the select system call that comes after the connect call. On UNIX the select will return after signaling the write set [ as documented]. On Windows the select will return after signaling the _error_ set [The MSDN documentation says that you need to check the write set]. My problem was that my select was checking only for write-ability [which is what the documentation says for both Windows and UNIX]. So on Windows I was forced to timeout while on UNIX I was returning immediately. Now I changed my select to check for both write-ability and error-ability on Windows. Then I call getsockopt which returns WSAECONNREFUSED as I have been expecting. Thanks again. On Mon, Aug 24, 2009 at 12:40 AM, David Schwartz dav...@webmaster.comwrote: Md Lazreg wrote: When my SSL server is up and running everything works as expected. When my SSL server is down, my client times out in 20 seconds because WSAGetLastError() returns WSAEWOULDBLOCK even when my server is not listening! I expect WSAGetLastError() to return WSAECONNREFUSED when my server is not listening... The problem I have with this is that my client is forced to wait for 20 seconds before giving up. I expect it to return immediately if the SSL server is not listening... Am I missing something? Thanks. Why? The SSL server might be restarting. Perhaps it will be listening again in a second or two. It takes as long as it takes to ensure that the server is not listening and will not resume listening. This is one of the differences between Windows and traditional UNIX systems. On Windows, if a server is overloaded, it refuses connections rather than silently ignoring them. As a result, when a client gets a connection refused, it cannot assume the server is not listening. It's possible the server is overloaded. So it has to try again, which takes some time. My question is why _using the same code_ Windows is returning WSAEWOULDBLOCK instead of WSAECONNREFUSED when my server is down? while UNIX correctly returns ECONNREFUSED... Because Windows cannot tell whether your server is down or overloaded. UNIX assumes that it is down, which may or may not be correct. The Windows client behavior you are seeing is correct, but only because it is assuming Windows server behavior that is incorrect. The UNIX behavior is incorrect -- it cannot be sure your server is actually down, but assumes so anyway -- but only because it assumes the server will behave correctly. Because Windows servers do not behave correctly, Windows clients are forced to behave incorrectly. I have yet to figure out why things are this way, but this is the way they are. It appears to be a deliberate Microsoft decision that we all have to live with. The summarize: When Windows server are overloaded, they reject connections rather than ignoring them. Thus, a client that sees a rejected connection cannot be sure the server is not running -- it could just be overloaded. (So you are correctly told the connection could not be made at that time, but might succeed later.) It's also possible the server is restarting, and will be accepting connections again in a second or two. The Windows client checks for this as well. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
With my new code, if my server is overloaded and cannot accept a connection immediately, my Windows client does not wait the whole timeout. This is the behavior I want. I do not want it to be sitting there just in case my server becomes available again or is on the process of being restarted. My original code was behaving like this: On UNIX: 1) Make a nonblocking connect call 2) I get EINPROGRESS 3) Do a select call that checks for writability of the socket [as man UNIX documentation says] 4) The select returns BEFORE the timeout and getsockopt shows me an error of ECONNREFUSED. [This is the behavior I want] On Windows: 1) Make a nonblocking connect call 2) I get WSAEWOULDBLOCK 3) Do a select call that checks for writability of the socket [as MSDN documentation says] 4) The select returns AFTER the timeout [This is the behavior I did not like because since my server is down I want my client to return immediately, I do not care if the server is going to be restarted, or if it was temporarily overloaded or else] My new code changes step 3) in the Windows code to check for writability and errorability by changing the following: status = select(m_sock_fd+1, NULL, WriteSet, NULL, tv); to this: status = select(m_sock_fd+1, NULL, WriteSet, ErrorSet, tv); Now Windows and UNIX behave the same. When my server is down or overloaded, the select call in the Windows client puts the socket in the ErrorSet and returns immediately, calling the getsockopt show me an error of WSAECONNREFUSED. It is possible that the previous Windows behavior is correct but that is not the behavior I want. I want the same behavior as UNIX which in my opinion is what my clients would want. My clients can connect to a set of servers in a raw, if one is not available for whatever reason I want them to move to the next one instead of having to wait the whole timeout before trying the next server. Thanks for your help. On Mon, Aug 24, 2009 at 6:47 PM, David Schwartz dav...@webmaster.comwrote: Md Lazreg wrote: On UNIX the select will return after signaling the write set [ as documented]. On Windows the select will return after signaling the _error_ set [The MSDN documentation says that you need to check the write set]. That makes no sense. My problem was that my select was checking only for write-ability [which is what the documentation says for both Windows and UNIX]. So on Windows I was forced to timeout while on UNIX I was returning immediately. Your code was correct. Now I changed my select to check for both write-ability and error-ability on Windows. Then I call getsockopt which returns WSAECONNREFUSED as I have been expecting. That makes no sense. I'm glad it's working for you, but that doesn't make any sense. What does your code do now if the server is overloaded and cannot accept a connection immediately? DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
Hello, I have a Windows client that tries to connect to an SSL server using the following code : int status = ::connect( m_sock_fd,(sockaddr *)m_addr,sizeof(m_addr)); if (status == 0) { return true; } else { if (WSAGetLastError()== WSAEWOULDBLOCK) { struct timeval tv; tv.tv_sec = 20; tv.tv_usec = 0; fd_set myset; FD_ZERO(myset); FD_SET(m_sock_fd, myset); status = select(m_sock_fd+1, NULL, myset, NULL, tv); /*some other code here*/ } else if (WSAGetLastError() == WSAECONNREFUSED) { return false; } else { /*some other code*/ } } The socket I am using is NONBLOCKING. When my SSL server is up and running everything works as expected. When my SSL server is down, my client times out in 20 seconds because WSAGetLastError() returns WSAEWOULDBLOCK even when my server is not listening! I expect WSAGetLastError() to return WSAECONNREFUSED when my server is not listening... The problem I have with this is that my client is forced to wait for 20 seconds before giving up. I expect it to return immediately if the SSL server is not listening... Am I missing something? Thanks.
WSAEWOULDBLOCK versus WSAECONNREFUSED
Hello, I have a Windows client that tries to connect to an SSL server using the following code : int status = ::connect( m_sock_fd,(sockaddr *)m_addr,sizeof(m_addr)); if (status == 0) { return true; } else { if (WSAGetLastError()== WSAEWOULDBLOCK) { struct timeval tv; tv.tv_sec = 20; tv.tv_usec = 0; fd_set myset; FD_ZERO(myset); FD_SET(m_sock_fd, myset); status = select(m_sock_fd+1, NULL, myset, NULL, tv); /*some other code here*/ } else if (WSAGetLastError() == WSAECONNREFUSED) { return false; } else { /*some other code*/ } } The socket I am using is NONBLOCKING. When my SSL server is up and running everything works as expected. When my SSL server is down, my client times out in 20 seconds because WSAGetLastError() returns WSAEWOULDBLOCK even when my server is not listening! I expect WSAGetLastError() to return WSAECONNREFUSED when my server is not listening... The problem I have with this is that my client is forced to wait for 20 seconds before giving up. I expect it to return immediately if the SSL server is not listening... Am I missing something? Thanks.
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
Thank you Ger for your reply. It is true that by using a nonblocking connect I want an instant answer but most importantly I want a correct answer. Using the same code under UNIX I get two instant correct answers: ECONNREFUSED [If my SSL server is down] EINPROGRESS [If my SSL server is up and listening] But under Windows I get the same answer regardless of the state of my SSL server: WSAEWOULDBLOCK My question is why _using the same code_ Windows is returning WSAEWOULDBLOCK instead of WSAECONNREFUSED when my server is down? while UNIX correctly returns ECONNREFUSED... Thanks On Sun, Aug 23, 2009 at 5:04 PM, Ger Hobbelt g...@hobbelt.com wrote: Since you use a nonblocking connect, you're essentially telling the software you want instant return. Which is what you get. Given that a TCP connection takes a little time (three network travels at least), that's definitely more time than you wish to wait given your nonblocking intent, so the IP stack properly and correctly tells you it'll take some time before you get the final result -- wouldblock. The next bit is where I'm a bit rusty (read as: the peculiarities of WinSock have eroded in my brain), but a glance at the code shows you're only select()ing for writing and IIRC a finalized non-blocking connect is equivalent to a 'ready-for-READING' select signal - for which you are not listening in your select as that argument is NULL. Hence my advise to also pass your handle in a separate fdset to select(read), i.e status = select(m_sock_fd+1, myreadset, mywriteset, NULL, tv); (so you can tell upon return which one fired) and lastly there's the ever-there mind-you nitpick that nonblocking I/O is well served with a statemachine around it, i.e. a loop, which tracks the current state of your connections/activity and acts upon that - it's the long way of saying that a connection may take longer than 20 seconds to establish, so an if-chain and a long wait isn't the end-all there, but this nitpick is not your problem. Yet. And the code structure is not enough to prove there's no statemachine already there in your code; the current code layout is a (very) weak hint, 's all. Anyway, a tip: this is generic TCP/IP socket programming we're talking about here and do yourself a favor and get a hold of the books by W Richard Stevens (R.I.P.). It's what I grew up with and those books of his have been among the very few which have never let me down in the hour of need. They're still 100% applicable and for IPv6 specifics, there's little enough change that the internet and manpages suffice for that. They are not WinSock specific, but for that one's peculiarities (such as the WSASelect limits) there's MSDN. Take care, Ger
Re: WSAEWOULDBLOCK versus WSAECONNREFUSED
Here is what MSDN says: http://msdn.microsoft.com/en-us/library/ms737625%28VS.85%29.aspx With a nonblocking socket, the connection attempt cannot be completed immediately. In this case, *connect* will return SOCKET_ERROR, and * WSAGetLastError* will return WSAEWOULDBLOCKhttp://msdn.microsoft.com/en-us/library/ms740668%28VS.85%29.aspx#winsock.wsaewouldblock_2. In this case, there are three possible scenarios: - Use the *select*http://msdn.microsoft.com/en-us/library/ms740141%28VS.85%29.aspxfunction to determine the completion of the connection request by checking to see if the socket is writeable. And here is what man connect says on UNIX: *EINPROGRESS* The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) http://man-wiki.net/index.php/2:select or poll(2) http://man-wiki.net/index.php/2:poll for completion by selecting the socket for writing. After select(2) http://man-wiki.net/index.php/2:select indicates writability, use getsockopt(2) http://man-wiki.net/index.php/2:getsockopt to read the *SO_ERROR* option at level *SOL_SOCKET* to determine whether *con-* *nect*() completed successfully (*SO_ERROR* is zero) or unsuccessfully (*SO_ERROR* is one of the usual error codes listed here, explaining the reason for the failure). So it seems that after select I need to check for writability, which is what I have been doing. Just as a test I started checking for read/write/error events by changing my code to this: fd_set myrset; FD_ZERO(myrset); FD_SET(m_sock_fd, myrset); fd_set mywset; FD_ZERO(mywset); FD_SET(m_sock_fd, mywset); fd_set myeset; FD_ZERO(myeset); FD_SET(m_sock_fd, myeset); status = select(m_sock_fd+1, myrset, mywset, myeset, tv); if (FD_ISSET(m_sock_fd,myrset)) { cout read set endl flush; } if (FD_ISSET(m_sock_fd,mywset)) { cout write set endl flush; } if (FD_ISSET(m_sock_fd,myeset)) { cout error set endl flush; } To my surprise, Windows puts the socket in the error set!!! which to my knowledge is not documented. Once select returns after signaling the error set, I call this: getsockopt(m_sock_fd, SOL_SOCKET, SO_ERROR, (char *)(valopt), lon) Now valopt is set to WSAECONNREFUSED. On the UNIX side, you are right, the connect returns EINPROGRESS, then the select signals the write set [ as documented ], the getsockopt then gives me ECONNREFUSED. I think this is a bug on Windows or at least the documentation is wrong. It should say to check on the error set not on the write set using the select call... Thanks. I have solved my problem. On Sun, Aug 23, 2009 at 6:20 PM, Ger Hobbelt g...@hobbelt.com wrote: Probably the difference is due to timing; to get the connection refused response, the client needs to at least transmit a packet and either never see a response (timing out) or receive a RST or (in some cases) an ICMP host-unreachable packet. Any way, the minimum time required to give you the 'connection refused' response is two network travels at the absolute minimum. Which means: it takes time. 'nonblocking' simply means 'I don't want to wait', so theoretically the IP stack doesn't have to wait even a milli/micro/femtosecond before returning to you. Which is what winsock does: queue request, instant return, hence wouldblock. Apparently (and this can be due to a myriad of reasons) your UNIX box either already 'knows' the server is down or has an IP stack which waits a short while before returning from connect, even in nonblocking conditions, so it is able to deliver a 'connection refused' to you on the initial return. It is true that by using a nonblocking connect I want an instant answer but most importantly I want a correct answer. The answer is correct, because the answer is depending on time. This may sound weird to you as it looks like the whole time/async behaviour pattern here are what's troubling you, but the nonblocking means the 'wouldblock' behaviour is the behaviour you would /expect/ from the system; the fact that the other machine isn't doing this is, at best, something to frown and ponder. /That/ one worries me, not the Win box. How can that machine know /instantaneously/ that the server at other end of the wire is down? There's no paranormal vibe there, so... is it's IP stack implemented to wait /anyway/ (bad!), despite the fact that it knows it's got a nonblocking request from the application? Is it at all /aware/ that this is a
SSL_shutdown crashes a solaris server
Hi, I have an SSL server that runs fine on Windows and on Linux. However on Solaris and IBM platforms, if the client is a windows one, my server crashes when the client exits. I was able to track the crash to the second call of SSL_shutdown below in my server code: r=SSL_shutdown(ssl); if(!r){ shutdown(s,1); r=SSL_shutdown(ssl); } Anyone knows why would SSL_shutdown crash? The OpenSSL documentation says this about the return value of SSL_shutdown: *0* The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error(3)http://www.openssl.org/docs/ssl/SSL_get_error.html#may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred. I did verify that at the crash time, the returned value is 0. Once the second SSL_shutdown is called then the server is dead. If the client is anything other than a windows client, then this problem does not occur!!... I am using : openssl-0.9.8e Thanks for any help.
Re: unexpected SSL_ERROR_ZERO_RETURN
I have solved my problem. The problem in my case was a server one. I use a non-blocking socket for the server to receive information from the clients, so the server performs a select with a timeout of 1 second to read information. It turns out that when they are network issues, 1 second is not enough and the select times out with a 0 return value, so the server was assuming that the client is gone and closes the connection Now, when the select times out I check the errno and if it is EAGAIN, I try select again... This solved the problem and all clients now are handled correctly... What confused me is the man select documentation which states that select sets errno only if the return value is -1... it seems that even if it returns 0 errno might be set... I have seen this problem on Linux and Hp platforms... Thanks On Wed, Jan 7, 2009 at 1:44 PM, Andrey Koltsov kolt...@cyberplat.comwrote: I have the same problem with my client Openssl application. The server side is MS IIS. And all other parties use Microsoft based clients and have no such problems. It seems that a client side is a source of trouble not a server side. Suggestions from anyone are welcome. Hi, I have an SSL server handling many clients successfully using openssl-0.9.8e. From time to time however, there are some clients that fail to connect to it. Debugging shows that the problem happens when the client attempts the first SSL_read, which unexpectedly returns 0. Checking then for the SSL error shows that it has the value SSL_ERROR_ZERO_RETURN. According to the SSL documentation this should happen only if the SSL connection has been closed. I do know that my server is not closing it since it is handling many other clients correctly, I also know that for the clients facing this problem, the handshake phase is done correctly it is only when the first SSL_read happens that somehow the connection is dropped I have no idea why if anyone can help me. Thanks __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
unexpected SSL_ERROR_ZERO_RETURN
Hi, I have an SSL server handling many clients successfully using openssl-0.9.8e. From time to time however, there are some clients that fail to connect to it. Debugging shows that the problem happens when the client attempts the first SSL_read, which unexpectedly returns 0. Checking then for the SSL error shows that it has the value SSL_ERROR_ZERO_RETURN. According to the SSL documentation this should happen only if the SSL connection has been closed. I do know that my server is not closing it since it is handling many other clients correctly, I also know that for the clients facing this problem, the handshake phase is done correctly it is only when the first SSL_read happens that somehow the connection is dropped I have no idea why if anyone can help me. Thanks Here is what the documentation says about SSL_ERROR_ZERO_RETURN: /* The TLS/SSL connection has been closed. If the protocol version is SSL 3.0 or TLS 1.0, this result code is returned only if a closure alert has occurred in the protocol, i.e., if the connection has been closed cleanly. Note that in this case SSL_ERROR_ZERO_RETURN does not necessarily indicate that the underlying transport has been closed. */
Re: client crash or network issue?
Thank you again David, It seems that now I understand all the crash scenarios and my server can deal with them correctly. Thank you for your guidance. As for the network issue scenarios here are some details about the last case: 1)The server is running on UNIX, the client is running on windows or unix. unplug the client or the server. The server does not report anything! it does not detect that its connection to the client is lost... SSL_read is not even called because my select does not detect any change on the socket This surprised me. 2)The server is running on windows, the client is either on windows or on unix. 2.1)unplug the server. It reports ECONNRESET. This is probably the bug you are talking about. How should I go about checking the interface here? any specific APIs to use? my server is in C/C++. Thanks. 2.2)unplug the client. My server reports nothing, similar to 1). This again surprised me but I am by no means an expert on sockets. While I can work around 2.1) by checking the interface as you suggested, I am at loss with 2.2) and 1). Because now my server has a situation where clients are no longer connected but it does not even know it... My server does a select on each client socket to wait for incoming messages, so I was hoping that a network disconnection is also an information that the select should detect but apparently this is not the case... Is there a way my server can be notified? if not, is there a way my server can proactively look for such clients? I am concerned about my server CPU usage in case they are too many disconnected clients... The reason I need all of this is that my server is using some important resources for each client. If a client connects then there is a network issue, the client might have finished its work, exited, but the server is still using the resources... Thanks again for all your help. On Tue, Nov 4, 2008 at 8:37 PM, David Schwartz [EMAIL PROTECTED] wrote: Thanks David. Unfortunately option 1) and 3) are not possible for my clients. In other words, you cannot engineer a sensible option and have to fake it. That's fine, but solutions that aren't engineered tend to be poor. option 2) seems the way to go for me, but so far it proved unreliable. That was the downside of that option. Here are some scenarios I have been playing with: 1)Crash a client running on unix: The SSL_read returns 0 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is 0! Seems reasonable. No unread data was pending, so the TCP connection closed normally. You would definitely infer a crash in this case. Network failures don't normally close connections. 2)Crash a client running on windows: The SSL_read returns -1 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is ECONNRESET [Connection reset by peer] So there was some pending unread data in this case. You would definitely infer a crash in this case. A network failure won't reset a connection, but a rebooting host might. So you can't be sure the client didn't crash. 3)Leave the client running on unix or on windows and unplug the network: The SSL_read returns -1 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is ECONNRESET [Connection reset by peer] Did you unplug the client or server? Was the server running Windows? You need to explain this case in detail. If you unplugged the *server* interface, then that's a very unusual special case that you need to specifically test for by checking the interface. (Due to an unfortunate Windows bug. It reports ECONNRESET when it loses a network interface even though the connection was *not* reset by the peer.) As you can see this does not seem to be reliable to distinguish between what really happened. The first two cases seem perfectly sensible. You didn't explain the third case in early enough detail for me to comment on it. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: client crash or network issue?
Thanks David. Unfortunately option 1) and 3) are not possible for my clients. option 2) seems the way to go for me, but so far it proved unreliable. Here are some scenarios I have been playing with: 1)Crash a client running on unix: The SSL_read returns 0 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is 0! 2)Crash a client running on windows: The SSL_read returns -1 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is ECONNRESET [Connection reset by peer] 3)Leave the client running on unix or on windows and unplug the network: The SSL_read returns -1 . The SSL error code is SSL_ERROR_SYSCALL [An SSL I/O error occurred]. The errno is ECONNRESET [Connection reset by peer] As you can see this does not seem to be reliable to distinguish between what really happened. Thank you for any other ideas. On Tue, Nov 4, 2008 at 4:55 PM, David Schwartz [EMAIL PROTECTED] wrote: Md Lazreg wrote: Actually the same question is valid even if I am not using SSL sockets. So is there a way to distinguish between if a socket was closed because of a client crash or because of a netwrok issue?. If yes, is there an equivalent under SSL sockets? You have three choices: 1) Always assume the client might return. Delay returning resources for a reasonable amount of time. 2) Guess based on the error code. For ECONNRESET, assume the client might come back. For ETIMEDOUT, assume it won't. For an apparently normal close (but at an unexpected time), assume it crashed. You'll be right some fraction of the time, depending on what types of errors happen. 3) Code a reliable method to tell. For example, code a way to probe if the client machine is still around (perhaps a separate daemon to report presence or report the crash of the client program). Code a proxy on the client (that is reliable enough to 'almost never' crash) that can report the loss of the other end of the proxy (the real client program) or similarly engineer a solution. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: client crash or network issue?
Actually the same question is valid even if I am not using SSL sockets. So is there a way to distinguish between if a socket was closed because of a client crash or because of a netwrok issue?. If yes, is there an equivalent under SSL sockets? Thanks On Wed, Oct 29, 2008 at 2:09 PM, Md Lazreg [EMAIL PROTECTED] wrote: Hi I have an ssl server where clients are connecting, requesting whatever they need then shutting down... The problem I have is that some clients will not shut down correctly, so my ssl server needs to handle the client disconnection. They are two types of disconnections: 1) The client crashed and was not able to send a clean shutdown --- In this case my ssl server needs to return whatever the resources that this client was using. 2) The client did not crash and is still running but there is was some network issue [a cable unplugged for example] -- In this case my ssl server should not release the resources for that client and show wait for it to reconnect once the network issue is solved. The problem I have is that my ssl server is not able to distinguish between the two situations. My SSL_read returns -1 in both cases... and a call to SSL_get_error(m_ssl, r) returns ***SSL_ERROR_SYSCALL *Is there a way I can distinguish between the two different situations? Thanks a lot.
client crash or network issue?
Hi I have an ssl server where clients are connecting, requesting whatever they need then shutting down... The problem I have is that some clients will not shut down correctly, so my ssl server needs to handle the client disconnection. They are two types of disconnections: 1) The client crashed and was not able to send a clean shutdown --- In this case my ssl server needs to return whatever the resources that this client was using. 2) The client did not crash and is still running but there is was some network issue [a cable unplugged for example] -- In this case my ssl server should not release the resources for that client and show wait for it to reconnect once the network issue is solved. The problem I have is that my ssl server is not able to distinguish between the two situations. My SSL_read returns -1 in both cases... and a call to SSL_get_error(m_ssl, r) returns ***SSL_ERROR_SYSCALL *Is there a way I can distinguish between the two different situations? Thanks a lot.
Re: SSL_accept hangs
On Thu, Mar 20, 2008 at 9:29 PM, David Schwartz [EMAIL PROTECTED] wrote: To Md Lazreg: I think I found it. I think you did find it. Now I am able to process more than 1000 clients without hanging. This is great. Thanks a lot for your expertise.
Re: SSL_accept hangs
Thanks Steve. If this helps anyone fixing this issue here is the backtrace once SSL_accept hangs: SSL_accept ssl23_accept ssl23_get_client_hello ssl23_read_bytes BIO_read sock_read __read_nocancel Thanks On Thu, Mar 20, 2008 at 8:22 AM, Steve West [EMAIL PROTECTED] wrote: We experienced a similar problem and had to back rev to 9.8.d Steve - Original Message - *From:* Md Lazreg [EMAIL PROTECTED] *To:* openssl-users@openssl.org *Sent:* Wednesday, March 19, 2008 8:04 PM *Subject:* SSL_accept hangs Hi, I have setup an SSL server that works fine up to 400 connected clients. When I try to have more then 400 clients, then my server hangs in the SSL_accept call This happens very randomly, sometimes beyond 1000 connected clients... The server is dead once this happen and no other client can connect. Please note that I am using non blocking sockets so SSL_accept _should_ return... but for whatever reason it does not. I am using openssl-0.9.8e Any suggestions please? Thanks -- No virus found in this incoming message. Checked by AVG. Version: 7.5.519 / Virus Database: 269.21.7/1335 - Release Date: 3/19/2008 9:54 AM
Re: SSL_accept hangs
Hi David, My code looks like this: 1 while(1) 2 { 3r = SSL_accept(m_ssl); 4if (r 0) 5{ 6 break; 7} 8r = ssl_retry(r); 9if ( r = 0) 10 { 11 break; 12} 13} The issue is not that it is going into an infinite while loop. The issue is that SSL_accept on line 3 never returns!. My socket is a non blocking one so as far as I know SSL_accept should return. A backtrace shows that when this happen the server gets stuck in: SSL_accept ssl23_accept ssl23_get_client_hello ssl23_read_bytes BIO_read sock_read __read_nocancel after calling SSL_accept. Thanks On Thu, Mar 20, 2008 at 11:44 AM, David Schwartz [EMAIL PROTECTED] wrote: Hi, I have setup an SSL server that works fine up to 400 connected clients. When I try to have more then 400 clients, then my server hangs in the SSL_accept call This happens very randomly, sometimes beyond 1000 connected clients... The server is dead once this happen and no other client can connect. Please note that I am using non blocking sockets so SSL_accept _should_ return... but for whatever reason it does not. What is your code *supposed* to do if SSL_accept bails out of accept immediately with EMFILE? If you keep looping and calling SSL_accept forever, then your code is going to loop forever. ret=accept(sock,(struct sockaddr *)from,(void *)len); if (ret == INVALID_SOCKET) { if(BIO_sock_should_retry(ret)) return -2; SYSerr(SYS_F_ACCEPT,get_last_socket_error()); BIOerr(BIO_F_BIO_ACCEPT,BIO_R_ACCEPT_ERROR); goto end; } DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: SSL_accept hangs
Hi David, On Thu, Mar 20, 2008 at 12:38 PM, David Schwartz [EMAIL PROTECTED] wrote: Hi David, My code looks like this: 1 while(1) 2 { 3r = SSL_accept(m_ssl); 4if (r 0) 5{ 6 break; 7} 8r = ssl_retry(r); 9if ( r = 0) 10 { 11 break; 12} 13} Well, that's obviously badly broken. It's probably not precisely your issue, but it's related. Since the socket is non blocking, there is no place for this code to block waiting for the connection! Well, that is not true and I am sorry I did not give you the full code as it is quite complicated but the snipet you see above is called after a new connection is already accepted. So I have an outer loop that does a select and once a new connection is detected and accepted without errors, I go ahead establishing the ssl part... Something like: ready_sockets = ::select(m_max_socket + 1, rfds, 0, 0,tv); if (ready_sockets 0) { if (FD_ISSET(s-get_sock(),p-get_rfds())) { new_s-set_non_blocking(true); if (s-accept(new_s)) { call the code above which will call SSL_accept } else { /*error handling*/ } So when the SSL_accept is called I already know that accept succeed and no EMFILE or ENFILE is generated. I am setting the socket as non blocking by simply calling: if (fcntl(m_sock_fd, F_SETFL, O_NONBLOCK) == -1) { return false; } I am confused when you say if my BIO is non-blocking too. I thought that it is non blocking since the underlying socket is non blocking. Is this a wrong assumption? if so how can I make the BIO non blocking [BIO_set_nbio?] Thank you for you help. The issue is not that it is going into an infinite while loop. That's just pure luck. The issue is that SSL_accept on line 3 never returns!. My socket is a non blocking one so as far as I know SSL_accept should return. How did you make it non blocking exactly? And is the BIO non-blocking too? A backtrace shows that when this happen the server gets stuck in: SSL_accept after calling SSL_accept. Sounds like you're lucky. The BIO is actually blocking and that's saving your code from looping. At least you're not burning the CPU. ;) What is your design intention if 'accept' returns EMFILE or ENFILE? If your answer is I have no idea or I never really thought about it, then it's no surprise your code mishandles this case. DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: SSL_accept hangs
On Thu, Mar 20, 2008 at 6:51 PM, David Schwartz [EMAIL PROTECTED] wrote: ready_sockets = ::select(m_max_socket + 1, rfds, 0, 0,tv); if (ready_sockets 0) { if (FD_ISSET(s-get_sock(),p-get_rfds())) { new_s-set_non_blocking(true); if (s-accept(new_s)) { call the code above which will call SSL_accept } else { /*error handling*/ } Where is the call to 'accept' (the system's 'accept')? Did you cut out a line before 'new_s-set_non_blocking'? Is 's-accept(new_s)' a wrapper around 'accept'? Can you paste the code to this wrapper? Yes the 's-accept(new_s)' is a wrapper around the system 'accept'. Here is the code for it: bool csocket::accept ( csocket * new_socket ) const { int addr_length = sizeof ( m_addr ); new_socket-m_sock_fd = ::accept ( m_sock_fd, ( sockaddr * ) m_addr, ( socklen_t * ) addr_length ); if ( new_socket-m_sock_fd = 0 ) { return false; } else { return true; } } So as you can see if accept returns EMFILE or ENFILE, I go immediately to the error handling section. I have added BIO_set_nbio call to my code following your advice : m_sbio = BIO_new_socket(m_sock_fd, BIO_NOCLOSE); BIO_set_nbio(m_sbio,1); SSL_set_bio(m_ssl, m_sbio, m_sbio); Unfortunately this did not make a difference and SSL_accept still hangs, sometimes after processing more than a 1000 clients... Thanks again. I am setting the socket as non blocking by simply calling: if (fcntl(m_sock_fd, F_SETFL, O_NONBLOCK) == -1) { return false; } This does not make the BIO non-blocking. That may or may not matter, but to tell I need to see where the actual call to the system's 'accept' function is taking place. And you still haven't pasted that code. I am confused when you say if my BIO is non-blocking too. I thought that it is non blocking since the underlying socket is non blocking. Is this a wrong assumption? if so how can I make the BIO non blocking [BIO_set_nbio?] Right. A blocking BIO with a non-blocking socket can cause serious problems. Where is the actual call to 'accept' to accept the connection? What happens if 'accept' returns EMFILE or ENFILE? DS __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
SSL_accept hangs
Hi, I have setup an SSL server that works fine up to 400 connected clients. When I try to have more then 400 clients, then my server hangs in the SSL_accept call This happens very randomly, sometimes beyond 1000 connected clients... The server is dead once this happen and no other client can connect. Please note that I am using non blocking sockets so SSL_accept _should_ return... but for whatever reason it does not. I am using openssl-0.9.8e Any suggestions please? Thanks
Re: d2i_X509 segmentation violation
Hi, I was able to fix the issue by adding the following flags to my openssl configure step: no-asm no-shared -fPIC -DPIC I do not know why this fixes the SEGSEGV but it does. On Jan 23, 2008 12:37 AM, Md Lazreg [EMAIL PROTECTED] wrote: Hi, I have the following code: --- unsigned char SERVER_certificate[1406]={ 0x30,0x82,0x05,0x7A,0x30,0x82,0x03,0x62,0x02,0x01,0x01,0x30,0x0D,0x06,0x09,0x2A, : : 0xb4, 0x78, 0xc6, 0x5a, 0x2d, 0x4c, 0xf9, 0xde, 0x7a }; const unsigned char * p = SERVER_certificate; X509 * server_cert = d2i_X509(NULL,p,sizeof(SERVER_certificate)); --- It works on all platforms except on a machine as follow: cat [EMAIL PROTECTED] /etc/issue Red Hat Enterprise Linux AS release 4 (Nahant Update 2) Kernel \r on an \m uname -a Linux bromden 2.6.9-22.EL #1 SMP Mon Sep 19 17:54:55 EDT 2005 ia64 ia64 ia64 GNU/Linux In such a configuration it crashes in the d2i_X509 function with a segmentation violation! The same code works on uname -a Linux unagi 2.6.5-7.97-default #1 SMP Fri Jul 2 14:21:59 UTC 2004 ia64 ia64 ia64 GNU/Linux cat /etc/issue Welcome to SUSE LINUX Enterprise Server 9 (ia64) - Kernel \r (\l). Any ideas please why d2i_X509 does not work on redhat 4 ia64? Thanks
d2i_X509 segmentation violation
Hi, I have the following code: --- unsigned char SERVER_certificate[1406]={ 0x30,0x82,0x05,0x7A,0x30,0x82,0x03,0x62,0x02,0x01,0x01,0x30,0x0D,0x06,0x09,0x2A, : : 0xb4, 0x78, 0xc6, 0x5a, 0x2d, 0x4c, 0xf9, 0xde, 0x7a }; const unsigned char * p = SERVER_certificate; X509 * server_cert = d2i_X509(NULL,p,sizeof(SERVER_certificate)); --- It works on all platforms except on a machine as follow: cat [EMAIL PROTECTED] /etc/issue Red Hat Enterprise Linux AS release 4 (Nahant Update 2) Kernel \r on an \m uname -a Linux bromden 2.6.9-22.EL #1 SMP Mon Sep 19 17:54:55 EDT 2005 ia64 ia64 ia64 GNU/Linux In such a configuration it crashes in the d2i_X509 function with a segmentation violation! The same code works on uname -a Linux unagi 2.6.5-7.97-default #1 SMP Fri Jul 2 14:21:59 UTC 2004 ia64 ia64 ia64 GNU/Linux cat /etc/issue Welcome to SUSE LINUX Enterprise Server 9 (ia64) - Kernel \r (\l). Any ideas please why d2i_X509 does not work on redhat 4 ia64? Thanks
public key in the binary
Hi, I am encrypting a file using a private key, and my program is decrypting it using the public key compiled in the binary. The question is how to protect my public key against binary analysis within the binary? I do not want someone to replace it with their own public key and hence encrypting my program's input using their private key. Any ideas please? Thanks
Re: public key in the binary
On 10/3/07, Victor Duchovni [EMAIL PROTECTED] wrote: On Wed, Oct 03, 2007 at 10:04:26AM -0500, Md Lazreg wrote: I am encrypting a file using a private key, and my program is decrypting it using the public key compiled in the binary. Private keys don't encrypt they sign. The public key *verifies*. If you want to encrypt, you use the public key to encrypt, and the holder of the private key can decrypt. Private keys do encrypt using the function : http://www.openssl.org/docs/crypto/RSA_private_encrypt.html The holder of the private key is me. And it is my application compiled with my public key that will decrypt whatever I have encrypted with my private key. My application will behave differently depending on what it finds in the decrypted information. The question is how to protect my public key against binary analysis within the binary? I do not want someone to replace it with their own public key and hence encrypting my program's input using their private key. Any ideas please? Sorry, keys are protected by OS permissions of separate key files, or by dedicated hardware that provides access to operations that use key, but not the key itself. If you are protecting data from the user of your application (DRM), you are mostly out of luck. I just want to make sure the user does not instrument my application by changing the public key compiled within it. Basically I am looking for some mathematical operations that will scatter my public key around my executable to make it hard to figure it out. Thanks -- Viktor. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: public key in the binary
On 10/3/07, Victor Duchovni [EMAIL PROTECTED] wrote: On Wed, Oct 03, 2007 at 10:42:59AM -0500, Md Lazreg wrote: Private keys do encrypt using the function : http://www.openssl.org/docs/crypto/RSA_private_encrypt.html Of course they do, but when a private key encrypts, it is called signing, because the public key is presumed to be (drum roll...) public i.e. not held in confidence exclusively by a single recipient. So encrypting with a private key yields signatures, not confidentiality. Ok I understand. Thanks. The holder of the private key is me. And it is my application compiled with my public key that will decrypt whatever I have encrypted with my private key. My application will behave differently depending on what it finds in the decrypted information. Are you signing instructions that the application authenticates, and should ignore if not signed by the right key, or sending confidential data for the eyes of the application only? If you are signing, your model is fine, and embedding the public key in the binary is exactly the right thing to do. If you are encrypting, use a symmetric algorithm, the public key algorithm is just confusing you. Yes I am signing. And the application will not work unless it is me who signed the input to it. That is why I do not want someone to change the public key within the application, because if they do they will be able to sign the input using their private key and make my application behave the way they want... I need a way to hide the public key in the binary... Thanks -- Viktor. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: public key in the binary
On 10/3/07, Victor Duchovni [EMAIL PROTECTED] wrote: On Wed, Oct 03, 2007 at 10:57:39AM -0500, Md Lazreg wrote: Is this DRM? DRM is not possible without trusted hardware, and even then is difficult. Yes it is DRM in a way. I know it is not possible to have a 100% protection using only software. I am only looking to make it a little bit harder by smartly hiding the public key in the application. What problem does preventing the user from fielding a modified application solve? It solves the problem of preventing the user from running my application in a mode they did not pay for. Thanks -- Viktor. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: public key in the binary
On 10/3/07, David Schwartz [EMAIL PROTECTED] wrote: I need a way to hide the public key in the binary... You can't ask in public for a good hiding place. Note that your question has *nothing* to do with OpenSSL or even public key encryption for that matter. Your question is basically how do I make a tamperproof executable. That is true. The OpenSSL users however are the best suited to answer such questions in my opinion. The suggestion by Marek Marcola to get the book Secure Programming Cookbook for C and C++ is a great one. I have already ordered this book and hopefully I will get some ideas there. Thanks all for your help.