Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-06 Thread Amos Jeffries

On 5/12/2012 9:19 a.m., Alex Rousskov wrote:

On 12/01/2012 06:02 PM, Alex Rousskov wrote:

To summarize, our choices for a pinned connection created for the
current client are:

Before sending the request:

1. Reopen and repin used pconns to send unretriable requests,
just like non-pinned connection code does. This eliminates
HTTP pconn race conditions for unretriable requests. This is
what the proposed patch does for SslBump.

2. Send all requests without reopening the pinned connection.
If we encounter an HTTP pconn race condition, see below.


After the request, if an HTTP pconn race happens:

0. Send an error message to the client.
This is what we do today and it breaks things.

1. Reset the client connection.
Pretend to be a dumb tunnel.

2. Retry, just like the current non-pinned connection code does.
This is what the proposed patch implements.


Current code does 2+0. That does not work.

The proposed patch does 1+2. That works in my limited tests.

Henrik suggested 2+1. I agree that it is also an option worth
considering.

I am leaning towards implementing 2+1 _and_ removing the current
connection repinning code as not needed.

We could leave repinning code in case it is needed later to fix broken
clients, but no such clients are known at this time, while that code is
complex and requires fixes (the patch posted here earlier), so I am
leaning towards removing it completely for now. With some effort it can
be re-introduced at a later time, of course.

If there are any better ideas, please let me know.


Re-pinning is somewhat useful for the cases of:
 * Kerberos/Bearer auth where credentials on the request are sufficient 
to re-auth the new connection identically to the original one.
 * NAT (not TPROXY) intercepted connections to the same destination 
IP:port. Where verification can lock it down to that destination, but 
not completely.


in both cases provided the destination IP:port is the same as the 
original one. Were it not for the required client IP:port combo already 
being in-use this would also be true for TPROXY. And were it not for the 
handshake hassle this would also be true for NTLM.


The cases with problems are:
 * NTLM due to the handshake complexity which nobody has implemented 
yet (otherwise this would be in the useful category too with Kerberos)

 * TPROXY where the src+dst tuplet has collisions with the old connection
 * CONNECT where we don't know what the state in the data channel 
contains. Bumping should fit into this too if we are to emulate the 
CONNECT properly, even though we do know the state.



Amos


Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-06 Thread Alex Rousskov
On 12/06/2012 05:21 AM, Amos Jeffries wrote:
 On 5/12/2012 9:19 a.m., Alex Rousskov wrote:
 On 12/01/2012 06:02 PM, Alex Rousskov wrote:
 To summarize, our choices for a pinned connection created for the
 current client are:

 Before sending the request:

 1. Reopen and repin used pconns to send unretriable requests,
 just like non-pinned connection code does. This eliminates
 HTTP pconn race conditions for unretriable requests. This is
 what the proposed patch does for SslBump.

 2. Send all requests without reopening the pinned connection.
 If we encounter an HTTP pconn race condition, see below.


 After the request, if an HTTP pconn race happens:

 0. Send an error message to the client.
 This is what we do today and it breaks things.

 1. Reset the client connection.
 Pretend to be a dumb tunnel.

 2. Retry, just like the current non-pinned connection code does.
 This is what the proposed patch implements.


 Current code does 2+0. That does not work.

 The proposed patch does 1+2. That works in my limited tests.

 Henrik suggested 2+1. I agree that it is also an option worth
 considering.
 I am leaning towards implementing 2+1 _and_ removing the current
 connection repinning code as not needed.

 We could leave repinning code in case it is needed later to fix broken
 clients, but no such clients are known at this time, while that code is
 complex and requires fixes (the patch posted here earlier), so I am
 leaning towards removing it completely for now. With some effort it can
 be re-introduced at a later time, of course.

 If there are any better ideas, please let me know.
 
 Re-pinning is somewhat useful for the cases of:
  * Kerberos/Bearer auth where credentials on the request are sufficient
 to re-auth the new connection identically to the original one.
  * NAT (not TPROXY) intercepted connections to the same destination
 IP:port. Where verification can lock it down to that destination, but
 not completely.


Hi Amos,

Since repinning has been declared a bug in general, let's leave
developing support for these special cases for those who need them.
Those folks would be in a better position to advocate that their cases
need repinning _and_ properly detect safe repinning conditions, validate
repinned connections, etc. They would be able to reuse most of our
deleted code, of course, but some new code will be needed to take care
of their cases anyway.

The same applies to any other special cases for repinning that we are
not aware of right now. Hopefully, each repinning case will be reviewed
more carefully in the future to avoid removal.


Thank you,

Alex.



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-04 Thread Alex Rousskov
On 12/01/2012 06:02 PM, Alex Rousskov wrote:
 To summarize, our choices for a pinned connection created for the
 current client are:
 
 Before sending the request:
 
 1. Reopen and repin used pconns to send unretriable requests,
just like non-pinned connection code does. This eliminates
HTTP pconn race conditions for unretriable requests. This is
what the proposed patch does for SslBump.
 
 2. Send all requests without reopening the pinned connection.
If we encounter an HTTP pconn race condition, see below.
 
 
 After the request, if an HTTP pconn race happens:
 
 0. Send an error message to the client.
This is what we do today and it breaks things.
 
 1. Reset the client connection.
Pretend to be a dumb tunnel.
 
 2. Retry, just like the current non-pinned connection code does.
This is what the proposed patch implements.
 
 
 Current code does 2+0. That does not work.
 
 The proposed patch does 1+2. That works in my limited tests.
 
 Henrik suggested 2+1. I agree that it is also an option worth
 considering.

I am leaning towards implementing 2+1 _and_ removing the current
connection repinning code as not needed.

We could leave repinning code in case it is needed later to fix broken
clients, but no such clients are known at this time, while that code is
complex and requires fixes (the patch posted here earlier), so I am
leaning towards removing it completely for now. With some effort it can
be re-introduced at a later time, of course.

If there are any better ideas, please let me know.


Thank you,

Alex.



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-02 Thread Henrik Nordström
lör 2012-12-01 klockan 18:02 -0700 skrev Alex Rousskov:

 a) If we are talking about unretriable requests, the client already
 violated HTTP by sending such a request over the persistent client
 connection. Some of these clients may fail to handle resets properly.

End-user-agents have much more freedom in what is retriable or not.

 b) To implement 2+1 correctly, Squid will need to close the _client_
 connection when the origin server sends Connection: close. We do not
 do that now IIRC.

Yes. If we don't do that today then it's a bug.

Regards
Henirk



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-01 Thread Henrik Nordström
fre 2012-11-30 klockan 23:07 -0700 skrev Alex Rousskov:
 I am not sure what you are asking about, but I can try to rephrase: This
 bug is difficult to fix because some pinned connections should be reused
 and some should not be. Pinned connections that can be re-pinned but
 have not had any HTTP requests sent on them should be reused, even for
 unretriable requests. SslBump creates such connections in forward.cc
 when Squid connects to the origin server to peak at the server
 certificate. Since no HTTP requests were sent on such connections at the
 decision time, this is not really a reuse even though it looks like one
 in all other aspects.

It is. You must take care to not reuse a slightly old (1s or so)
connection under those conditions.

  Which it quite likely the wrong thing to do. See above.
 
 Does the !flags.canRePin exception address your concern?

Yes, if used where needed (TPROXY, NTLM).

Regards
Henrk



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-01 Thread Alex Rousskov
On 12/01/2012 11:20 AM, Henrik Nordström wrote:
 fre 2012-11-30 klockan 23:07 -0700 skrev Alex Rousskov:
 Does the !flags.canRePin exception address your concern?

 Yes, if used where needed (TPROXY, NTLM).

By default, the canRePin flag is not set and pinned connections are
reused, even for unretriable requests. Thus, bare TPROXY and NTLM code
should be fine without any special/additional changes.

However, a combination of TPROXY and SslBump will see the canRePin flag
set (by the SslBump code). We have not heard complaints that the combo
does not work even though recent SslBump code reopened and repinned
closed server-side connections. Perhaps those bug reports are yet to
come. Why can't TPROXY reopen a server-side connection?


Thank you,

Alex.
P.S. Even if we do not handle TPROXY+SslBump combo correctly today, the
required fix will be outside the proposed patch. The patch is still
needed to handle cases where pinned connections can be reopened and
repinned.



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-01 Thread Alex Rousskov
On 11/30/2012 08:27 PM, Amos Jeffries wrote:
 On 1/12/2012 1:31 p.m., Henrik Nordström wrote:
 fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov:

  Squid is sending POST requests on reused pinned connections, and
 some of those requests fail due to a pconn race, with no possibility for
 a retry.
 Yes... and we have to for NTLM, TPROXY and friends or they get in a bit
 of trouble from connection state mismatch.

 If sending the request fails we should propagate this to the client by
 resetting the client connection and let the client retry.
 
 It seems to me we are also forced to do this for ssl-bump connections.

The current code does not force it. From user experience point of view,
resetting the client connection is usually the wrong choice in this
context (it breaks sites/transactions that work correctly with the
current code). Thus, if we disagree whether this should be the default
behavior, somebody will need to add a knob to control it from squid.conf.


  * Opening a new connection is the wrong thing to do for server-first
 bumped connections, where the new connection MAY go to a completely
 different server than the one whose certificate was bumped with. We
 control the IP:port we connect to, but we cannot control IP-level load
 balancers existence.

Not exactly. Opening a new connection MIGHT be wrong (not IS wrong). In
all real use cases we know of today, it is actually the right thing to
do. So far, we have found no cases where a load balancer or a similar
device sends the request to a semantically different server. I am not
saying such cases do not exist, but they are definitely not the norm.

Please note that Squid checks the certificate of the new server when the
connection is repinned. We do not just trust that the new server
certificate is going to be similar. IIRC, this has been discussed in the
past, when bump-server-first patches were reviewed.


  * client-first bumped connections do not face the lag, *BUT* there is
 no way to identify them at forwarding time separately from server-first
 bumped.

The lag is not the primary issue here. Pconn races are. All SslBump
connections suffer from those.

We can discuss the lag separately. Since reconnecting after 1 second lag
may face the same lag, it may not really help. I would not object to
adding an optional feature to reconnect after X second lag, but I
recommend waiting for a real use case to justify that complication (and
to help us detail the fix correctly).



  * we are pretending to be a dumb relay - which offers the ironclad
 guarantee that the server at the other end is a single TCP endpoint (DNS
 uncertainty is only on the initial setup. Once connected packets reach
 *an* endpoint they all do or the connection dies).

Conceptually, yes. In practice, we break this popular site reasons
require us to make implementation adjustments to that nice theoretical
concept of a dumb relay.


 We can control the outgoing IP:port details, but have no control over
 the existence of IP-level load balancers which can screw with the
 destination server underneath us. Gambling on the destination not
 changing on an HTTPS outbound when retrying for intercepted traffic will
 re-opening at least two CVE issues 3.2 is supposed to be immune to
 (CVE-2009-0801 and CVE-2009-3555).

We do not gamble. We verify that the new destination has a similar
certificate.


 Races are also still very possible on server-bumped connections if for
 any reason it takes longer to receive+parse+adapt+reparse the client
 request than the server wants to wait for.

Yes, but that would not be a pconn race. Let's not enlarge the scope of
the problem in the context of this patch review. The proposed patch
fixes pconn races (with real use cases behind the fix!). It does not
pretend to fix something else.


 A straight usage counter is deftinitely the wrong thing to use to
 control this whether or not you agree with us that re-trying outbound
 connections is safe after guaranteeing teh clietn (with encryption
 certificate no less) that a single destinatio has been setup. What is
 needed is a suitable length idle timeout and a close handler.

The usage counter is the right approach to detect pconn races, by
definition: A pconn race happens _after_ the first HTTP connection use.
The counter simply helps us detect when we have used the HTTP connection
at least once.


 The issue is not that the conn was used then pooled versus pinned. The
 issue is that async period between last and current packet on the socket
 - we have no way to identify if the duration between has caused problems
 (crtd, adaptation or ACL lag might be enough to die from some race with
 NAT timeouts). Whether that past use was the SSL exchange (server-bump
 only) or a previous HTTP data packet.

I am not trying to fix all possible race conditions. The patch is trying
to fix just one: an HTTP pconn race. Other fixes would be welcomed, of
course, especially if backed by real use cases/need.



 I agree 

Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-01 Thread Alex Rousskov
Hello,

It just occurred to me that there is nothing wrong, from theoretical
protocol point of view, with reseting the client connection when a
server-side race condition is detected _and_ we know that the client was
the one that caused Squid to open the server connection. I personally do
not know whether there are use cases that would be broken by that.

To summarize, our choices for a pinned connection created for the
current client are:

Before sending the request:

1. Reopen and repin used pconns to send unretriable requests,
   just like non-pinned connection code does. This eliminates
   HTTP pconn race conditions for unretriable requests. This is
   what the proposed patch does for SslBump.

2. Send all requests without reopening the pinned connection.
   If we encounter an HTTP pconn race condition, see below.


After the request, if an HTTP pconn race happens:

0. Send an error message to the client.
   This is what we do today and it breaks things.

1. Reset the client connection.
   Pretend to be a dumb tunnel.

2. Retry, just like the current non-pinned connection code does.
   This is what the proposed patch implements.


Current code does 2+0. That does not work.

The proposed patch does 1+2. That works in my limited tests.

Henrik suggested 2+1. I agree that it is also an option worth
considering. Will it break actual clients? I do not know, but:

a) If we are talking about unretriable requests, the client already
violated HTTP by sending such a request over the persistent client
connection. Some of these clients may fail to handle resets properly.

b) To implement 2+1 correctly, Squid will need to close the _client_
connection when the origin server sends Connection: close. We do not
do that now IIRC.


Thank you,

Alex.



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-01 Thread Henrik Nordström
lör 2012-12-01 klockan 11:42 -0700 skrev Alex Rousskov:
 come. Why can't TPROXY reopen a server-side connection?

Because it tries to use the original source ip:port, but it's in
TIME_WAIT from the previous connection and can't be reused.

Regards
Henrik



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-11-30 Thread Henrik Nordström
fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov:

 Squid is sending POST requests on reused pinned connections, and
 some of those requests fail due to a pconn race, with no possibility for
 a retry.

Yes... and we have to for NTLM, TPROXY and friends or they get in a bit
of trouble from connection state mismatch.

If sending the request fails we should propagate this to the client by
resetting the client connection and let the client retry.

 When using SslBump, the HTTP request is always forwarded using a server
 connection pinned to the HTTP client connection. Squid does not reuse
 a persistent connection from the idle pconn pool for bumped client
 requests.

Ok.

  Squid uses the dedicated pinned server connection instead.
 This bypasses pconn race controls even though Squid may be essentially
 reusing an idle HTTP connection and, hence, may experience the same kind
 of race conditions.

Yes..

 However, connections that were just pinned, without sending any
 requests, are not essentially reused idle pconns so we must be careful
 to allow unretriable requests on freshly pinned connections.

?

 The same logic applies to pinned connection outside SslBump.

Which it quite likely the wrong thing to do. See above.

Regards
Henrik



Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-11-30 Thread Amos Jeffries

On 1/12/2012 1:31 p.m., Henrik Nordström wrote:

fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov:


 Squid is sending POST requests on reused pinned connections, and
some of those requests fail due to a pconn race, with no possibility for
a retry.

Yes... and we have to for NTLM, TPROXY and friends or they get in a bit
of trouble from connection state mismatch.

If sending the request fails we should propagate this to the client by
resetting the client connection and let the client retry.


It seems to me we are also forced to do this for ssl-bump connections.
 * Opening a new connection is the wrong thing to do for server-first 
bumped connections, where the new connection MAY go to a completely 
different server than the one whose certificate was bumped with. We 
control the IP:port we connect to, but we cannot control IP-level load 
balancers existence.
 * client-first bumped connections do not face the lag, *BUT* there is 
no way to identify them at forwarding time separately from server-first 
bumped.
 * we are pretending to be a dumb relay - which offers the ironclad 
guarantee that the server at the other end is a single TCP endpoint (DNS 
uncertainty is only on the initial setup. Once connected packets reach 
*an* endpoint they all do or the connection dies).



We can control the outgoing IP:port details, but have no control over 
the existence of IP-level load balancers which can screw with the 
destination server underneath us. Gambling on the destination not 
changing on an HTTPS outbound when retrying for intercepted traffic will 
re-opening at least two CVE issues 3.2 is supposed to be immune to 
(CVE-2009-0801 and CVE-2009-3555).


Races are also still very possible on server-bumped connections if for 
any reason it takes longer to receive+parse+adapt+reparse the client 
request than the server wants to wait for. Remember we have all the slow 
trickle arrival of headers, parsing, adaptation, helpers and access 
controls to work though before it gets to use the pinned server conn. 
For example Squid is extremely likely to lose closure races on a mobile 
network when some big event is on that everyone has to 
google/twitter/facebook about while every request gets bumped and sent 
through an ICAP filter (BBC at the London Olympics).





When using SslBump, the HTTP request is always forwarded using a server
connection pinned to the HTTP client connection. Squid does not reuse
a persistent connection from the idle pconn pool for bumped client
requests.

Ok.


  Squid uses the dedicated pinned server connection instead.
This bypasses pconn race controls even though Squid may be essentially
reusing an idle HTTP connection and, hence, may experience the same kind
of race conditions.

Yes..


However, connections that were just pinned, without sending any
requests, are not essentially reused idle pconns so we must be careful
to allow unretriable requests on freshly pinned connections.

?


A straight usage counter is deftinitely the wrong thing to use to 
control this whether or not you agree with us that re-trying outbound 
connections is safe after guaranteeing teh clietn (with encryption 
certificate no less) that a single destinatio has been setup. What is 
needed is a suitable length idle timeout and a close handler.
 Both of which for bumped connections should trigger un-pinning and 
abort the client connection. If the timouts are not being set on 
server-bump pinned connections then that is the bug and needs to be 
fixed ASAP.



The issue is not that the conn was used then pooled versus pinned. The 
issue is that async period between last and current packet on the socket 
- we have no way to identify if the duration between has caused problems 
(crtd, adaptation or ACL lag might be enough to die from some race with 
NAT timeouts). Whether that past use was the SSL exchange (server-bump 
only) or a previous HTTP data packet. I agree this is just as much true 
on bumped connections which were pinned at some unknown time earlier as 
it is for connections pulled out of a shared pool and last used some 
unknown time earlier. Regardless of how the persistence was done they 
*are* essentially reused idle persistent connections. All the same 
risks/problems, but whether retry or alternative connection setup is 
possible differs greatly between the traffic types - with intercepted 
traffic (of any source) the re-try is more dangerous than informing the 
client with an aborted connection.





The same logic applies to pinned connection outside SslBump.

Which it quite likely the wrong thing to do. See above.

Regards
Henrik



Amos


Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-11-30 Thread Alex Rousskov
On 11/30/2012 05:31 PM, Henrik Nordström wrote:
 fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov:
 
 Squid is sending POST requests on reused pinned connections, and
 some of those requests fail due to a pconn race, with no possibility for
 a retry.
 
 Yes... and we have to for NTLM, TPROXY and friends or they get in a bit
 of trouble from connection state mismatch.

Those connections that cannot be repinned should not have canRePin flag
set IIRC. The patch does not change their behavior -- they will continue
to be reused because the last no choice but to risk it condition will
be true for them:

 +// reuse a connection if it is safe or if we have no other 
 choice
 +if (pconnRace != racePossible || // unused connections are safe
 +checkRetriable() || // we can retry if there is a race
 +!request-flags.canRePin) { // we have no other choice


 If sending the request fails we should propagate this to the client by
 resetting the client connection and let the client retry.

yes, except when the connection can be repinned safely. Such a
connection (and only such connection) should not be reused for
non-retriable requests.


  Squid uses the dedicated pinned server connection instead.
 This bypasses pconn race controls even though Squid may be essentially
 reusing an idle HTTP connection and, hence, may experience the same kind
 of race conditions.
 
 Yes..
 
 However, connections that were just pinned, without sending any
 requests, are not essentially reused idle pconns so we must be careful
 to allow unretriable requests on freshly pinned connections.
 
 ?

I am not sure what you are asking about, but I can try to rephrase: This
bug is difficult to fix because some pinned connections should be reused
and some should not be. Pinned connections that can be re-pinned but
have not had any HTTP requests sent on them should be reused, even for
unretriable requests. SslBump creates such connections in forward.cc
when Squid connects to the origin server to peak at the server
certificate. Since no HTTP requests were sent on such connections at the
decision time, this is not really a reuse even though it looks like one
in all other aspects.


 The same logic applies to pinned connection outside SslBump.
 
 Which it quite likely the wrong thing to do. See above.

Does the !flags.canRePin exception address your concern?


Thank you,

Alex.