Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 5/12/2012 9:19 a.m., Alex Rousskov wrote: On 12/01/2012 06:02 PM, Alex Rousskov wrote: To summarize, our choices for a pinned connection created for the current client are: Before sending the request: 1. Reopen and repin used pconns to send unretriable requests, just like non-pinned connection code does. This eliminates HTTP pconn race conditions for unretriable requests. This is what the proposed patch does for SslBump. 2. Send all requests without reopening the pinned connection. If we encounter an HTTP pconn race condition, see below. After the request, if an HTTP pconn race happens: 0. Send an error message to the client. This is what we do today and it breaks things. 1. Reset the client connection. Pretend to be a dumb tunnel. 2. Retry, just like the current non-pinned connection code does. This is what the proposed patch implements. Current code does 2+0. That does not work. The proposed patch does 1+2. That works in my limited tests. Henrik suggested 2+1. I agree that it is also an option worth considering. I am leaning towards implementing 2+1 _and_ removing the current connection repinning code as not needed. We could leave repinning code in case it is needed later to fix broken clients, but no such clients are known at this time, while that code is complex and requires fixes (the patch posted here earlier), so I am leaning towards removing it completely for now. With some effort it can be re-introduced at a later time, of course. If there are any better ideas, please let me know. Re-pinning is somewhat useful for the cases of: * Kerberos/Bearer auth where credentials on the request are sufficient to re-auth the new connection identically to the original one. * NAT (not TPROXY) intercepted connections to the same destination IP:port. Where verification can lock it down to that destination, but not completely. in both cases provided the destination IP:port is the same as the original one. Were it not for the required client IP:port combo already being in-use this would also be true for TPROXY. And were it not for the handshake hassle this would also be true for NTLM. The cases with problems are: * NTLM due to the handshake complexity which nobody has implemented yet (otherwise this would be in the useful category too with Kerberos) * TPROXY where the src+dst tuplet has collisions with the old connection * CONNECT where we don't know what the state in the data channel contains. Bumping should fit into this too if we are to emulate the CONNECT properly, even though we do know the state. Amos
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 12/06/2012 05:21 AM, Amos Jeffries wrote: On 5/12/2012 9:19 a.m., Alex Rousskov wrote: On 12/01/2012 06:02 PM, Alex Rousskov wrote: To summarize, our choices for a pinned connection created for the current client are: Before sending the request: 1. Reopen and repin used pconns to send unretriable requests, just like non-pinned connection code does. This eliminates HTTP pconn race conditions for unretriable requests. This is what the proposed patch does for SslBump. 2. Send all requests without reopening the pinned connection. If we encounter an HTTP pconn race condition, see below. After the request, if an HTTP pconn race happens: 0. Send an error message to the client. This is what we do today and it breaks things. 1. Reset the client connection. Pretend to be a dumb tunnel. 2. Retry, just like the current non-pinned connection code does. This is what the proposed patch implements. Current code does 2+0. That does not work. The proposed patch does 1+2. That works in my limited tests. Henrik suggested 2+1. I agree that it is also an option worth considering. I am leaning towards implementing 2+1 _and_ removing the current connection repinning code as not needed. We could leave repinning code in case it is needed later to fix broken clients, but no such clients are known at this time, while that code is complex and requires fixes (the patch posted here earlier), so I am leaning towards removing it completely for now. With some effort it can be re-introduced at a later time, of course. If there are any better ideas, please let me know. Re-pinning is somewhat useful for the cases of: * Kerberos/Bearer auth where credentials on the request are sufficient to re-auth the new connection identically to the original one. * NAT (not TPROXY) intercepted connections to the same destination IP:port. Where verification can lock it down to that destination, but not completely. Hi Amos, Since repinning has been declared a bug in general, let's leave developing support for these special cases for those who need them. Those folks would be in a better position to advocate that their cases need repinning _and_ properly detect safe repinning conditions, validate repinned connections, etc. They would be able to reuse most of our deleted code, of course, but some new code will be needed to take care of their cases anyway. The same applies to any other special cases for repinning that we are not aware of right now. Hopefully, each repinning case will be reviewed more carefully in the future to avoid removal. Thank you, Alex.
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 12/01/2012 06:02 PM, Alex Rousskov wrote: To summarize, our choices for a pinned connection created for the current client are: Before sending the request: 1. Reopen and repin used pconns to send unretriable requests, just like non-pinned connection code does. This eliminates HTTP pconn race conditions for unretriable requests. This is what the proposed patch does for SslBump. 2. Send all requests without reopening the pinned connection. If we encounter an HTTP pconn race condition, see below. After the request, if an HTTP pconn race happens: 0. Send an error message to the client. This is what we do today and it breaks things. 1. Reset the client connection. Pretend to be a dumb tunnel. 2. Retry, just like the current non-pinned connection code does. This is what the proposed patch implements. Current code does 2+0. That does not work. The proposed patch does 1+2. That works in my limited tests. Henrik suggested 2+1. I agree that it is also an option worth considering. I am leaning towards implementing 2+1 _and_ removing the current connection repinning code as not needed. We could leave repinning code in case it is needed later to fix broken clients, but no such clients are known at this time, while that code is complex and requires fixes (the patch posted here earlier), so I am leaning towards removing it completely for now. With some effort it can be re-introduced at a later time, of course. If there are any better ideas, please let me know. Thank you, Alex.
Re: [PATCH] Do not send unretriable requests on reused pinned connections
lör 2012-12-01 klockan 18:02 -0700 skrev Alex Rousskov: a) If we are talking about unretriable requests, the client already violated HTTP by sending such a request over the persistent client connection. Some of these clients may fail to handle resets properly. End-user-agents have much more freedom in what is retriable or not. b) To implement 2+1 correctly, Squid will need to close the _client_ connection when the origin server sends Connection: close. We do not do that now IIRC. Yes. If we don't do that today then it's a bug. Regards Henirk
Re: [PATCH] Do not send unretriable requests on reused pinned connections
fre 2012-11-30 klockan 23:07 -0700 skrev Alex Rousskov: I am not sure what you are asking about, but I can try to rephrase: This bug is difficult to fix because some pinned connections should be reused and some should not be. Pinned connections that can be re-pinned but have not had any HTTP requests sent on them should be reused, even for unretriable requests. SslBump creates such connections in forward.cc when Squid connects to the origin server to peak at the server certificate. Since no HTTP requests were sent on such connections at the decision time, this is not really a reuse even though it looks like one in all other aspects. It is. You must take care to not reuse a slightly old (1s or so) connection under those conditions. Which it quite likely the wrong thing to do. See above. Does the !flags.canRePin exception address your concern? Yes, if used where needed (TPROXY, NTLM). Regards Henrk
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 12/01/2012 11:20 AM, Henrik Nordström wrote: fre 2012-11-30 klockan 23:07 -0700 skrev Alex Rousskov: Does the !flags.canRePin exception address your concern? Yes, if used where needed (TPROXY, NTLM). By default, the canRePin flag is not set and pinned connections are reused, even for unretriable requests. Thus, bare TPROXY and NTLM code should be fine without any special/additional changes. However, a combination of TPROXY and SslBump will see the canRePin flag set (by the SslBump code). We have not heard complaints that the combo does not work even though recent SslBump code reopened and repinned closed server-side connections. Perhaps those bug reports are yet to come. Why can't TPROXY reopen a server-side connection? Thank you, Alex. P.S. Even if we do not handle TPROXY+SslBump combo correctly today, the required fix will be outside the proposed patch. The patch is still needed to handle cases where pinned connections can be reopened and repinned.
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 11/30/2012 08:27 PM, Amos Jeffries wrote: On 1/12/2012 1:31 p.m., Henrik Nordström wrote: fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov: Squid is sending POST requests on reused pinned connections, and some of those requests fail due to a pconn race, with no possibility for a retry. Yes... and we have to for NTLM, TPROXY and friends or they get in a bit of trouble from connection state mismatch. If sending the request fails we should propagate this to the client by resetting the client connection and let the client retry. It seems to me we are also forced to do this for ssl-bump connections. The current code does not force it. From user experience point of view, resetting the client connection is usually the wrong choice in this context (it breaks sites/transactions that work correctly with the current code). Thus, if we disagree whether this should be the default behavior, somebody will need to add a knob to control it from squid.conf. * Opening a new connection is the wrong thing to do for server-first bumped connections, where the new connection MAY go to a completely different server than the one whose certificate was bumped with. We control the IP:port we connect to, but we cannot control IP-level load balancers existence. Not exactly. Opening a new connection MIGHT be wrong (not IS wrong). In all real use cases we know of today, it is actually the right thing to do. So far, we have found no cases where a load balancer or a similar device sends the request to a semantically different server. I am not saying such cases do not exist, but they are definitely not the norm. Please note that Squid checks the certificate of the new server when the connection is repinned. We do not just trust that the new server certificate is going to be similar. IIRC, this has been discussed in the past, when bump-server-first patches were reviewed. * client-first bumped connections do not face the lag, *BUT* there is no way to identify them at forwarding time separately from server-first bumped. The lag is not the primary issue here. Pconn races are. All SslBump connections suffer from those. We can discuss the lag separately. Since reconnecting after 1 second lag may face the same lag, it may not really help. I would not object to adding an optional feature to reconnect after X second lag, but I recommend waiting for a real use case to justify that complication (and to help us detail the fix correctly). * we are pretending to be a dumb relay - which offers the ironclad guarantee that the server at the other end is a single TCP endpoint (DNS uncertainty is only on the initial setup. Once connected packets reach *an* endpoint they all do or the connection dies). Conceptually, yes. In practice, we break this popular site reasons require us to make implementation adjustments to that nice theoretical concept of a dumb relay. We can control the outgoing IP:port details, but have no control over the existence of IP-level load balancers which can screw with the destination server underneath us. Gambling on the destination not changing on an HTTPS outbound when retrying for intercepted traffic will re-opening at least two CVE issues 3.2 is supposed to be immune to (CVE-2009-0801 and CVE-2009-3555). We do not gamble. We verify that the new destination has a similar certificate. Races are also still very possible on server-bumped connections if for any reason it takes longer to receive+parse+adapt+reparse the client request than the server wants to wait for. Yes, but that would not be a pconn race. Let's not enlarge the scope of the problem in the context of this patch review. The proposed patch fixes pconn races (with real use cases behind the fix!). It does not pretend to fix something else. A straight usage counter is deftinitely the wrong thing to use to control this whether or not you agree with us that re-trying outbound connections is safe after guaranteeing teh clietn (with encryption certificate no less) that a single destinatio has been setup. What is needed is a suitable length idle timeout and a close handler. The usage counter is the right approach to detect pconn races, by definition: A pconn race happens _after_ the first HTTP connection use. The counter simply helps us detect when we have used the HTTP connection at least once. The issue is not that the conn was used then pooled versus pinned. The issue is that async period between last and current packet on the socket - we have no way to identify if the duration between has caused problems (crtd, adaptation or ACL lag might be enough to die from some race with NAT timeouts). Whether that past use was the SSL exchange (server-bump only) or a previous HTTP data packet. I am not trying to fix all possible race conditions. The patch is trying to fix just one: an HTTP pconn race. Other fixes would be welcomed, of course, especially if backed by real use cases/need. I agree
Re: [PATCH] Do not send unretriable requests on reused pinned connections
Hello, It just occurred to me that there is nothing wrong, from theoretical protocol point of view, with reseting the client connection when a server-side race condition is detected _and_ we know that the client was the one that caused Squid to open the server connection. I personally do not know whether there are use cases that would be broken by that. To summarize, our choices for a pinned connection created for the current client are: Before sending the request: 1. Reopen and repin used pconns to send unretriable requests, just like non-pinned connection code does. This eliminates HTTP pconn race conditions for unretriable requests. This is what the proposed patch does for SslBump. 2. Send all requests without reopening the pinned connection. If we encounter an HTTP pconn race condition, see below. After the request, if an HTTP pconn race happens: 0. Send an error message to the client. This is what we do today and it breaks things. 1. Reset the client connection. Pretend to be a dumb tunnel. 2. Retry, just like the current non-pinned connection code does. This is what the proposed patch implements. Current code does 2+0. That does not work. The proposed patch does 1+2. That works in my limited tests. Henrik suggested 2+1. I agree that it is also an option worth considering. Will it break actual clients? I do not know, but: a) If we are talking about unretriable requests, the client already violated HTTP by sending such a request over the persistent client connection. Some of these clients may fail to handle resets properly. b) To implement 2+1 correctly, Squid will need to close the _client_ connection when the origin server sends Connection: close. We do not do that now IIRC. Thank you, Alex.
Re: [PATCH] Do not send unretriable requests on reused pinned connections
lör 2012-12-01 klockan 11:42 -0700 skrev Alex Rousskov: come. Why can't TPROXY reopen a server-side connection? Because it tries to use the original source ip:port, but it's in TIME_WAIT from the previous connection and can't be reused. Regards Henrik
Re: [PATCH] Do not send unretriable requests on reused pinned connections
fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov: Squid is sending POST requests on reused pinned connections, and some of those requests fail due to a pconn race, with no possibility for a retry. Yes... and we have to for NTLM, TPROXY and friends or they get in a bit of trouble from connection state mismatch. If sending the request fails we should propagate this to the client by resetting the client connection and let the client retry. When using SslBump, the HTTP request is always forwarded using a server connection pinned to the HTTP client connection. Squid does not reuse a persistent connection from the idle pconn pool for bumped client requests. Ok. Squid uses the dedicated pinned server connection instead. This bypasses pconn race controls even though Squid may be essentially reusing an idle HTTP connection and, hence, may experience the same kind of race conditions. Yes.. However, connections that were just pinned, without sending any requests, are not essentially reused idle pconns so we must be careful to allow unretriable requests on freshly pinned connections. ? The same logic applies to pinned connection outside SslBump. Which it quite likely the wrong thing to do. See above. Regards Henrik
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 1/12/2012 1:31 p.m., Henrik Nordström wrote: fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov: Squid is sending POST requests on reused pinned connections, and some of those requests fail due to a pconn race, with no possibility for a retry. Yes... and we have to for NTLM, TPROXY and friends or they get in a bit of trouble from connection state mismatch. If sending the request fails we should propagate this to the client by resetting the client connection and let the client retry. It seems to me we are also forced to do this for ssl-bump connections. * Opening a new connection is the wrong thing to do for server-first bumped connections, where the new connection MAY go to a completely different server than the one whose certificate was bumped with. We control the IP:port we connect to, but we cannot control IP-level load balancers existence. * client-first bumped connections do not face the lag, *BUT* there is no way to identify them at forwarding time separately from server-first bumped. * we are pretending to be a dumb relay - which offers the ironclad guarantee that the server at the other end is a single TCP endpoint (DNS uncertainty is only on the initial setup. Once connected packets reach *an* endpoint they all do or the connection dies). We can control the outgoing IP:port details, but have no control over the existence of IP-level load balancers which can screw with the destination server underneath us. Gambling on the destination not changing on an HTTPS outbound when retrying for intercepted traffic will re-opening at least two CVE issues 3.2 is supposed to be immune to (CVE-2009-0801 and CVE-2009-3555). Races are also still very possible on server-bumped connections if for any reason it takes longer to receive+parse+adapt+reparse the client request than the server wants to wait for. Remember we have all the slow trickle arrival of headers, parsing, adaptation, helpers and access controls to work though before it gets to use the pinned server conn. For example Squid is extremely likely to lose closure races on a mobile network when some big event is on that everyone has to google/twitter/facebook about while every request gets bumped and sent through an ICAP filter (BBC at the London Olympics). When using SslBump, the HTTP request is always forwarded using a server connection pinned to the HTTP client connection. Squid does not reuse a persistent connection from the idle pconn pool for bumped client requests. Ok. Squid uses the dedicated pinned server connection instead. This bypasses pconn race controls even though Squid may be essentially reusing an idle HTTP connection and, hence, may experience the same kind of race conditions. Yes.. However, connections that were just pinned, without sending any requests, are not essentially reused idle pconns so we must be careful to allow unretriable requests on freshly pinned connections. ? A straight usage counter is deftinitely the wrong thing to use to control this whether or not you agree with us that re-trying outbound connections is safe after guaranteeing teh clietn (with encryption certificate no less) that a single destinatio has been setup. What is needed is a suitable length idle timeout and a close handler. Both of which for bumped connections should trigger un-pinning and abort the client connection. If the timouts are not being set on server-bump pinned connections then that is the bug and needs to be fixed ASAP. The issue is not that the conn was used then pooled versus pinned. The issue is that async period between last and current packet on the socket - we have no way to identify if the duration between has caused problems (crtd, adaptation or ACL lag might be enough to die from some race with NAT timeouts). Whether that past use was the SSL exchange (server-bump only) or a previous HTTP data packet. I agree this is just as much true on bumped connections which were pinned at some unknown time earlier as it is for connections pulled out of a shared pool and last used some unknown time earlier. Regardless of how the persistence was done they *are* essentially reused idle persistent connections. All the same risks/problems, but whether retry or alternative connection setup is possible differs greatly between the traffic types - with intercepted traffic (of any source) the re-try is more dangerous than informing the client with an aborted connection. The same logic applies to pinned connection outside SslBump. Which it quite likely the wrong thing to do. See above. Regards Henrik Amos
Re: [PATCH] Do not send unretriable requests on reused pinned connections
On 11/30/2012 05:31 PM, Henrik Nordström wrote: fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov: Squid is sending POST requests on reused pinned connections, and some of those requests fail due to a pconn race, with no possibility for a retry. Yes... and we have to for NTLM, TPROXY and friends or they get in a bit of trouble from connection state mismatch. Those connections that cannot be repinned should not have canRePin flag set IIRC. The patch does not change their behavior -- they will continue to be reused because the last no choice but to risk it condition will be true for them: +// reuse a connection if it is safe or if we have no other choice +if (pconnRace != racePossible || // unused connections are safe +checkRetriable() || // we can retry if there is a race +!request-flags.canRePin) { // we have no other choice If sending the request fails we should propagate this to the client by resetting the client connection and let the client retry. yes, except when the connection can be repinned safely. Such a connection (and only such connection) should not be reused for non-retriable requests. Squid uses the dedicated pinned server connection instead. This bypasses pconn race controls even though Squid may be essentially reusing an idle HTTP connection and, hence, may experience the same kind of race conditions. Yes.. However, connections that were just pinned, without sending any requests, are not essentially reused idle pconns so we must be careful to allow unretriable requests on freshly pinned connections. ? I am not sure what you are asking about, but I can try to rephrase: This bug is difficult to fix because some pinned connections should be reused and some should not be. Pinned connections that can be re-pinned but have not had any HTTP requests sent on them should be reused, even for unretriable requests. SslBump creates such connections in forward.cc when Squid connects to the origin server to peak at the server certificate. Since no HTTP requests were sent on such connections at the decision time, this is not really a reuse even though it looks like one in all other aspects. The same logic applies to pinned connection outside SslBump. Which it quite likely the wrong thing to do. See above. Does the !flags.canRePin exception address your concern? Thank you, Alex.