Re: [squid-dev] [PATCH] Reuse reserved Negotiate and NTLM helpers after an idle timeout.

Amos Jeffries Mon, 31 Jul 2017 08:25:26 -0700

On 31/07/17 22:24, Christos Tsantilas wrote:

Στις 30/07/2017 06:48 πμ, ο Amos Jeffries έγραψε:
On 27/07/17 18:52, Christos Tsantilas wrote:
The patch.
Στις 26/07/2017 12:37 μμ, ο Christos Tsantilas έγραψε:
Squid can be killed or maimed by enough clients that startmulti-step connection authentication but never follow up with thesecond HTTP request while keeping their HTTP connection open.Affected helpers remain in the "reserved" state and cannot be reusedfor other clients. Observed helper exhaustion has happened withoutany malicious intent.
To address the problem, we add a helper reservation timeout. Timedout reserved helpers may be reused by new clients/connections. Tominimize problems with slow-to-resume-authentication clients, timedout reserved helpers are not reused until there are no unreservedrunning helpers left. The reservations are tracked using uniqueinteger IDs.
Since NTLM and Negotiate are both very stateful security protocolsthis re-use is only possible if the helper is using concurrency in itscommunications with Squid. To do so otherwise would randomly allowreplay attacks to succeed - in a way that would be extremely nasty totroubleshoot. Attackers are fully able to flood an auth backend withtraffic outside of Squid and slow it down sufficiently for this attackto become a problem.Note that the type-1 tokens where the TCP connection can be closedalso should not reserve a helper - if that is happening it is a bugregardless of this patch.
To reach the desired behaviour it would be better to actuallyimplement concurrency support for the stateful helpers interfaces. Sowe can ensure the helper is fully aware of the separation betweenclients auth sessions regardless of whether any given token getsreplayed.
It should not be much of a step to use the concurrency channel-ID aspart of the reservationId. Helpers that support concurrency should nothave trouble servicing auth on other channel-ID until the total numberof reserved channels gets extremely high. At that point restarting thehelper is sane and the existing on-persistent-overload logics can beapplied to whether Squid dies or simply kills the blocked helper (ERR).
This patch is going most of the way towards that already by usingreservation-ID to replace hardcoded class dependencies, and moving alot of code to the Base class. But it does not go far enough to makethe change safe to add to Squid IMO.
NP: I have only briefly checked the code to be sure this was notactually doing the concurrency change yet without mentioning it.However, one other major issue stands out;
StatefulGetFirstAvailable() is a C-style accessor method for thestateful_helper, we simply have not managed to make it one properlyyet. The value it is given is the context within/for which the callerneeds it to find a usable *_server. Changing that context in order tofind a different server result (for some other context) is not anappropriate thing to be doing.
The only thing in the new code for StatefulGetFirstAvailable() whichmight lead to it needing non-const is the HelperServerBase::reserved()method being indirectly called. That method should be "virtual boolreserved() const = 0". Fixing that should make the
Yes this method can be const.
StatefulGetFirstAvailable() API const'ness change unnecessary.
This change was not required in any case. Sorry.
-1 for now, and please do not apply this until the replay attackproblem is fully resolved.
My sense is that the attack problem can never be "fully" resolved.
But the patch does not try to solve any attack problem. It try to solveproblems on normal squid operation.

The problem you described of helpers being hung in reserved state cannothappen during *normal* NTLM/Negotiate operations. Normal clients a)complete the auth sequence, or b) terminate the connection quickly andthus trigger cleanups to happen in Squid, or c) are actually still goingto produce the token this helper is reserved and waiting for. None ofthe _normal_ traffic cases are served well by this change.

So either there is a bug in Squid client-connection cleanup (case 'b')which you are not addressing at all. Or, the use-case you are solvingfor here are ones where clients are performing a DoS attack against theproxy, whether you understood that or not.

What your patch does is changing a basic DoS vulnerability to an accounthijacking / replay vulnerability. Even though the replay may be muchmore difficult to achieve than the original DoS the damage and severitywhen it happens is far, far worse. Sufficiently so that the original DoSis preferable to have happening.

At the very least the DoS situation is easy to see and remedy. Probablythat visibility is why it gets so much complaining and the other(several!) common problems with these helpers get hardly a mention.

This patch plus the client_ip_max_connections option, and a smallreservation-timeout value should be enough for most cases.
Using concurrency for stateful helpers require changes in existingstateful helpers distributed with squid and other custom helpers used bymany users. The changes investigated by this patch required even if weare going to support concurrency to allow users using their existinghelpers.

I'm not proposing that we change the interface to concurrency always-on.It should be consistent with the concurrency on the other helperinterfaces, which is default-off but available for those who need andcan use it. For all those same reasons you mention.

For NTLM the common (only?) helper is the Samba one. That apparentlyalready supports concurrency. So if that works with our latest type ofconcurrency numbering, then the change should not be a huge problem forNTLM users.

For Kerberos the common helpers are Marcus ones which we ship withSquid. Some coordination will be needed there to keep his separatelydistributed copy in sync, but relatively easy.

As you said "this patch is going most of the way towards".
I do not think it is good idea to reject this patch only because it"does not go far enough". In a second step and if required, we canimplement concurrency on stateful helpers.

In this case not going far enough leaves open a far worse vulnerabilitythan the one being resolved. I'm rejecting it primarily because of thereplay/hijack addition being worse than the DoS.

If you can come up with some other way to avoid the DoS pains withoutgetting into worse problems I am open to temporary measures. But giventhe direction this code has gone it seems to me that going for theconcurrency support is less additional work than the analysis requiredto find+code good alternatives would take.

FWIW; I am tempted by the old idea of just letting Squid abort thehelpers that are hung and starting new ones afresh. But experiencescoming back from people using the dynamic-helpers feature show that eventhat does not help much. Instead of Squid aborting with the FATALmessage it continues until the kernel oom-killer aborts it insteadanyway, or for some very high-performance proxies the TCP stack canstill run out of sockets in the half second or so before the new helpershave started producing responses.

 Up to you whether you think implementing that as a quick-fix is worth it.


Amos
_______________________________________________
squid-dev mailing list
[email protected]
http://lists.squid-cache.org/listinfo/squid-dev

Re: [squid-dev] [PATCH] Reuse reserved Negotiate and NTLM helpers after an idle timeout.

Reply via email to