-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 21/11/2014 5:38 p.m., Alex Rousskov wrote: > On 11/17/2014 05:56 PM, Amos Jeffries wrote: > >> For now the detection is only added during parsing of regex >> tokens or files. > > And it should probably stay that way: We cannot easily add support > for \-based escaping in bare strings because in some of those > strings \ is meaningful on its own (unfortunately). > > >> + while (ConfigParser::RecognizeQuotedPair_ && *(nextToken-1) >> == '\\') { > > Are you sure that nextToken never points at the start of a buffer, > especially when parsing parameters loaded from external files? If > you are absolutely sure, this guarantee should be documented where > ConfigParser::TokenParse() is defined. Otherwise, the code has to > be revised (but still support regex parameter files that start with > a backslash).
I dont believe that is possible. nextToken always points at the current buffer, although it may point at the \0 c-string terminator. There is an un-conditional strcspn() call prior to the while-loop which will move the nextToken pointer down into the buffer at least one character in distance. If the buffer starts with a '\' character the nextToken should be pointing at its escaped value character. If it does not start with that then it will be somewhere down the buffer after the next delimiter. I have moved the \0 terminator check up into the while() condition so that it will not do a [-1] de-reference if the buffer is ever of 0-length. Although that should have been found and protected against already by the if-condition at the start of the method. > > The second problem with this code is that it incorrectly handles > "\\X" AFAICT: X should not be escaped, even though it is preceded > by a backslash (because that backslash is itself escaped). This code does not un-escape anything. All it does is notice '\' escapes and ensures that both the '\' and character following are included in the token (unless that following character is the \0 terminator). The regex pattern parser is responsible for any un-escaping and processing of the token itself. It accounts for tokens starting with a '\', for multiple sequential '\', and for "loaded file lines wrongly ending with a '\'. In that latter case by treating them as an un-escaped '\' character in the pattern token. > >> + <tag>configuration_includes_quoted_values</tag> + <p>Regex >> expression values can not be parsed when this directive is + >> configured to <em>ON</em>. > > > This is a little misleading because, IIRC, the > configuration_includes_quoted_values hack can be turned ON for just > a section of the configuration file.\ I suggest s/when this > directive is configured to ON/in parts of configuration where this > directive is set to ON/. > Done. > > Please consider s/can not/cannot/ for improved consistency: > > $ fgrep -i 'can not' src/*pre | wc -l 7 $ fgrep -i 'cannot' > src/*pre | wc -l 19 > Done for the release notes text that made you point this out. The others are unrelated to this patch. Will do as a separate cleanup change. > >> + <tag>configuration_includes_quoted_values</tag> + <p>Regex >> expression values can not be parsed when this directive is + >> configured to <em>ON</em>. Instead Squid now accepts regex >> \-escaped + characters including escaped whitespace. > > > We should clarify that \-escaped characters are supported when > parsing regular expressions, not just any token. I also suggest > moving the last sentence to its own paragraph as it has no relation > to configuration_includes_quoted_values: It is directly related to the previous sentence. eg. We do not support A. Instead we support B. Also, all <p> entries following <tag> are documented changes to that same <tag>. I opted for documenting this under configuration_includes_quoted_values since the alternative would be listing each and every directive in Squid which has a regex option and repeating the same two sentences about how it operates in regards to when a previous configuration_includes_quoted_values was set to ON. I'm in half a mind to temporarily disable the configuration_includes_quoted_values when calling RegexToken(). But since we already have a hard error when the two collide I left it as-is and try to document what to do instead of placing quotes around the pattern. > > <p>When parsing regular expressions, Squid now accepts \-escaped > characters, including escaped whitespace.</p> > > If "escaped whitespace" means \s, I would say so explicitly. If it > means "\ ", I would say "including escaped space characters". Escaped whitespace means \ followed by whitespace. As per the bug report use case. acl foo browser Windows\ XP Windows.NT has two patterns in it equivalent to the single pattern: acl foo browser Windows(\ XP|.NT) Fixed. >> + static bool RecognizeQuotedPair_; ///< The next tokens may >> containing quoted-pair (\-escaped) characters > > s/containing/contain/ Fixed. > > > Except for the nextToken-1 concerns, the above corrections are > cosmetic, of course. > Fixed all, and applied to trunk as rev.13730 Amos -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUfyJVAAoJELJo5wb/XPRjRmsIAIv+hgXFiQd0St4U3wFYneSR RGlFpnAGs8NyYiihxf7nnGBgYzVvbBbzLEIPSAzemGGfhfIsmu36ruSCj74qsA77 zTLt6bBwmSVUwrjrPdWmua64rswB4WOibtdaBr2+qCiy82G6CrbS8aP0CSC7h7p3 Uyvl1JzJHDFxz8qj31KJv+gvFo5RKYjD7p1Q+OJYkwhFweVk5G4C9Pr3olmyY7zn cV4y07L+8NFt/+r2n0rceVBKjiITvAGXr3X9cKDhdJwUE102TqAY1XaPaSl3UQJE GKrz0srs3A1Vn0JVyq7xf4GP4xqxrqNhLP0Uoh64oCn+l7ZXCuPyhVnghYg48Ck= =O2Dd -----END PGP SIGNATURE----- _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev