On 7/01/2017 5:41 a.m., Eduard Bagdasaryan wrote: > > On 06.01.2017 15:27, Amos Jeffries wrote: >> As a result, the code responsible for lower-case >>> transformation was not executed. >> >> That is intentional behaviour for several reasons; >> >> 1) it improves transparency and reduces risks from proxy >> fingerprinting by systems probing the URI scheme handling by the >> transport agents (ie, fingerprinting Squid). >> >> 2) unknown URI schemes are not necessarily handled properly as >> case-insensitive by the experimental agents sending and receiving the >> messages. >> >> also, (and more importantly); > > The patch does not change this, i.e., "unknown" images are still stored > without > down-casing. > >> >> 3) the transport protocol label and URI scheme label are still >> conflated. The scheme down-casing procedure is _only_ applicable when >> translating from ProtocolType_str labels (upper case) to scheme label >> (lower case). > > To avoid misunderstanding I pay your attention that the unpatched Squid > did not > down-case at all (i.e. for known ProtocolType_str schemes too). In other > words, when > receiving HTTP://example.com "HTTP" was not down-cased. Just this > violates HTTP > caching rules: two different cache entries were created for > HTTP://example.com > and http://example.com requests. > >> >> >> 4) storing the down-cased string for registered protocols of each URI >> avoids many explicit down-casing operations on use/display of the URI >> scheme. Note that is specific to the known protocols. >> >> - There are many more points of code displaying the scheme than >> setting it. So this is a significant performance gain despite the >> overhead of allocating and own-casing a new SBuf per UriScheme object >> your patch notes with an XXX. > > I am not against allocating and storing down-cased SBuf "image_" (for > performance sake). > The related XXX is about allocating SBuf which we probably can avoid in > future optimization. > For example, we could do this by converting ProtocolType_str to a const > array of SBufs, thus > avoiding image_ member allocation when dealing with known protocols. >
That will not help avoid the re-allocation since ProtocolType_str should be upper case and COW property will reallocate for lower-casing anyway. I'm thinking the quick-and-dirty way is to just lowercase the 'proto' variable in url.cc urlParse() function. Doing that in the for-loop where it is copied from 'src' would be easiest. - it breaks the case preservation on unknown schemes a litte bit. But since they are supposed to be insensitive anyway the harm is minimal. Amos _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev