On 31/07/17 22:24, Christos Tsantilas wrote:
Στις 30/07/2017 06:48 πμ, ο Amos Jeffries έγραψε:
On 27/07/17 18:52, Christos Tsantilas wrote:
The patch.

Στις 26/07/2017 12:37 μμ, ο Christos Tsantilas έγραψε:
Squid can be killed or maimed by enough clients that start multi-step connection authentication but never follow up with the second HTTP request while keeping their HTTP connection open. Affected helpers remain in the "reserved" state and cannot be reused for other clients. Observed helper exhaustion has happened without any malicious intent.

To address the problem, we add a helper reservation timeout. Timed out reserved helpers may be reused by new clients/connections. To minimize problems with slow-to-resume-authentication clients, timed out reserved helpers are not reused until there are no unreserved running helpers left. The reservations are tracked using unique integer IDs.


Since NTLM and Negotiate are both very stateful security protocols this re-use is only possible if the helper is using concurrency in its communications with Squid. To do so otherwise would randomly allow replay attacks to succeed - in a way that would be extremely nasty to troubleshoot. Attackers are fully able to flood an auth backend with traffic outside of Squid and slow it down sufficiently for this attack to become a problem. Note that the type-1 tokens where the TCP connection can be closed also should not reserve a helper - if that is happening it is a bug regardless of this patch.


To reach the desired behaviour it would be better to actually implement concurrency support for the stateful helpers interfaces. So we can ensure the helper is fully aware of the separation between clients auth sessions regardless of whether any given token gets replayed.

It should not be much of a step to use the concurrency channel-ID as part of the reservationId. Helpers that support concurrency should not have trouble servicing auth on other channel-ID until the total number of reserved channels gets extremely high. At that point restarting the helper is sane and the existing on-persistent-overload logics can be applied to whether Squid dies or simply kills the blocked helper (ERR).


This patch is going most of the way towards that already by using reservation-ID to replace hardcoded class dependencies, and moving a lot of code to the Base class. But it does not go far enough to make the change safe to add to Squid IMO.


NP: I have only briefly checked the code to be sure this was not actually doing the concurrency change yet without mentioning it. However, one other major issue stands out;

StatefulGetFirstAvailable() is a C-style accessor method for the stateful_helper, we simply have not managed to make it one properly yet. The value it is given is the context within/for which the caller needs it to find a usable *_server. Changing that context in order to find a different server result (for some other context) is not an appropriate thing to be doing.

The only thing in the new code for StatefulGetFirstAvailable() which might lead to it needing non-const is the HelperServerBase::reserved() method being indirectly called. That method should be "virtual bool reserved() const = 0". Fixing that should make the

Yes this method can be const.

StatefulGetFirstAvailable() API const'ness change unnecessary.

This change was not required in any case. Sorry.



-1 for now, and please do not apply this until the replay attack problem is fully resolved.

My sense is that the attack problem can never be "fully" resolved.
But the patch does not try to solve any attack problem. It try to solve problems on normal squid operation.

The problem you described of helpers being hung in reserved state cannot happen during *normal* NTLM/Negotiate operations. Normal clients a) complete the auth sequence, or b) terminate the connection quickly and thus trigger cleanups to happen in Squid, or c) are actually still going to produce the token this helper is reserved and waiting for. None of the _normal_ traffic cases are served well by this change.

So either there is a bug in Squid client-connection cleanup (case 'b') which you are not addressing at all. Or, the use-case you are solving for here are ones where clients are performing a DoS attack against the proxy, whether you understood that or not.


What your patch does is changing a basic DoS vulnerability to an account hijacking / replay vulnerability. Even though the replay may be much more difficult to achieve than the original DoS the damage and severity when it happens is far, far worse. Sufficiently so that the original DoS is preferable to have happening.

At the very least the DoS situation is easy to see and remedy. Probably that visibility is why it gets so much complaining and the other (several!) common problems with these helpers get hardly a mention.



This patch plus the client_ip_max_connections option, and a small reservation-timeout value should be enough for most cases.

Using concurrency for stateful helpers require changes in existing stateful helpers distributed with squid and other custom helpers used by many users. The changes investigated by this patch required even if we are going to support concurrency to allow users using their existing helpers.

I'm not proposing that we change the interface to concurrency always-on. It should be consistent with the concurrency on the other helper interfaces, which is default-off but available for those who need and can use it. For all those same reasons you mention.

For NTLM the common (only?) helper is the Samba one. That apparently already supports concurrency. So if that works with our latest type of concurrency numbering, then the change should not be a huge problem for NTLM users.

For Kerberos the common helpers are Marcus ones which we ship with Squid. Some coordination will be needed there to keep his separately distributed copy in sync, but relatively easy.




As you said "this patch is going most of the way towards".
I do not think it is good idea to reject this patch only because it "does not go far enough". In a second step and if required, we can implement concurrency on stateful helpers.

In this case not going far enough leaves open a far worse vulnerability than the one being resolved. I'm rejecting it primarily because of the replay/hijack addition being worse than the DoS.

If you can come up with some other way to avoid the DoS pains without getting into worse problems I am open to temporary measures. But given the direction this code has gone it seems to me that going for the concurrency support is less additional work than the analysis required to find+code good alternatives would take.


FWIW; I am tempted by the old idea of just letting Squid abort the helpers that are hung and starting new ones afresh. But experiences coming back from people using the dynamic-helpers feature show that even that does not help much. Instead of Squid aborting with the FATAL message it continues until the kernel oom-killer aborts it instead anyway, or for some very high-performance proxies the TCP stack can still run out of sockets in the half second or so before the new helpers have started producing responses.
 Up to you whether you think implementing that as a quick-fix is worth it.


Amos
_______________________________________________
squid-dev mailing list
[email protected]
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to