On 2017-01-06 10:24, Eduard Bagdasaryan wrote:
Hello,

This patch fixes URI schemes to be case-insensitive (r14802 regression).

The bug was introduced by r14802 (better support of unknown
URL schemes). AnyP::UriScheme constructor had a problem: the
explicitly-specified img parameter (always provided by URL::setScheme())
was used for all schemes, not only for unknown ones (as it obviously
should be).

The situation is not so obvious. Alex and I already had a rather long audit discussion about the finer points of that before the 14802 commit. The problem is rooted in the unfortunate fact that transport protocol labels are conflated into the scheme types, and are both upper-case + case-sensitive.

For context please read:
<http://lists.squid-cache.org/pipermail/squid-dev/2016-June/005995.html>


As a result, the code responsible for lower-case
transformation was not executed.

That is intentional behaviour for several reasons;

1) it improves transparency and reduces risks from proxy fingerprinting by systems probing the URI scheme handling by the transport agents (ie, fingerprinting Squid).

2) unknown URI schemes are not necessarily handled properly as case-insensitive by the experimental agents sending and receiving the messages.

also, (and more importantly);

3) the transport protocol label and URI scheme label are still conflated. The scheme down-casing procedure is _only_ applicable when translating from ProtocolType_str labels (upper case) to scheme label (lower case). - as we improve the code to remove the conflation problem the uses of that logic path should trend towards zero, then we can drop it entirely.

- with unknown protocols the UriScheme object will have to be used as the transport representation (ew, but that what we have today). So retaining the upper case version there on unknown things is currently important while being not-harmful for the scheme comparisons which should be ignoring case anyhow.

- we do not yet properly handle scheme comparisons for unknown schemes, so down-casing for that reason is not part of the existing code. The #1 security consideration will need to be balanced into the logic when that is eventually fixed.


4) storing the down-cased string for registered protocols of each URI avoids many explicit down-casing operations on use/display of the URI scheme. Note that is specific to the known protocols.

- There are many more points of code displaying the scheme than setting it. So this is a significant performance gain despite the overhead of allocating and own-casing a new SBuf per UriScheme object your patch notes with an XXX.


5) the most common cases for this method being called are internal uses where img is passed with a static lower-cased value, and input from urlParse() parsing an absolute-URI from clients sending properly lower-cased URI schemes.

- So checking for (img && scheme == (NONE or UNKNOWN)) is unnecessary extra logic to begin with and by causing the lower-casing logic to be applied for registered schemes is also re-adding a major performance regression in the common traffic case.



There are a couple of XXX and one of them (about "broken caller" for
PROTO_NONE) needs clarification. This caller uses PROTO_NONE
(i.e., absent scheme) together with "http" scheme image inside
clientReplyContext::createStoreEntry() which looks inconsistent and
probably is a bug. Am I missing anything there?

Thats an odd case. I'm not exactly sure what we should be doing there. Thus the unusual parameter combo. It was done that way to minimize breakage with old code - so the request can be detected as missing scheme (avoid confusing with a real HTTP URL) if anything tries to compare it, but still generates a reasonably correct URL if its dumped to store files.

Amos

_______________________________________________
squid-dev mailing list
[email protected]
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to