I would concur with Alex that (4) is preferable: It does not break old configurations, re-uses existing mechanisms and allows to apply it only when/where required. I have one more option for your consideration: escaping with a backtick (e.g., `n) instead of a backslash. This approach is used, e.g., in PowerShell.

5a. Recognize just `n escape sequence in squid.conf regexes.

5b. Recognize all '`'-based escape sequences in squid.conf regexes.

Pros:  Easier upgrade: backtick is rare in regular expressions (compared to '%' or '/'), probably there is no need to convert old regexes at all.
Pros:  Simplicity: no double-escaping is required (as in (1b)).
Cons: Though it should be straightforward to specify common escape sequences, such as `n, `r or `t, we still need to devise a way of providing arbitrary character (i.e., its code) in this way.


HTH,

Eduard.


On 20.01.2022 00:32, Alex Rousskov wrote:
Here is a fairly representative sample:

1a. Recognize just \n escape sequence in squid.conf regexes
    Pros: Simple.
    Cons: Converting old regexes[1] requires careful checking[2].
    Cons: Cannot detect typos in escape sequences. \r is accepted.
    Cons: Cannot address other, similar use cases (e.g., ASCII CR).

1b. Recognize all C escape sequences in squid.conf regexes
    Pros: Can detect typos -- unsupported escape sequences.
    Cons: Poor readability: Double-escaping of all for-regex backslashes!
    Cons: Converting old regexes requires non-trivial automation.


2a. Recognize %byte{n} logformat-like sequence in squid.conf regexes
    Pros: Simple.
    Cons: Converting old regexes[1] requires careful checking[3].
    Cons: Cannot detect typos in logformat-like sequences.
    Cons: Does not support other advanced use cases (e.g., %tr).

2b. Recognize %byte{n} and logformat sequences in squid.conf regexes
    Pros: Can detect typos -- unsupported logformat sequences.
    Cons: The need to escape % in regexes will surprise admins.
    Cons: Converting old regexes requires (simple) automation.


3. Use composition to combine regexes and some special strings:
    regex1 + "\n" + regex2
    or
    regex1 + %byte{10} + regex2
    Pros: Old regexes can be safely used without any conversions.
    Cons: Requires new, complex composition expressions/syntax.
    Cons: A bit difficult to read.
    Cons: Requires a lot of development.


4. Use 2b but only when regex is given to a special function:
    substitute_logformat_codes(regex)
    Pros: Old regexes can be safely used without any conversions.
    Pros: New regexes do not need to escape % (by default).
    Pros: Extendable to old regex configuration contexts.
    Pros: Extendable to non-regex configuration contexts.
    Pros: Reusing the existing parameters(...)-like call syntax.
    Cons: A bit more difficult to read than 1a or 2a.
    Cons: Duplicates "quoted string" approach in some directives[4].
    Cons: Requires arguing about the new function name:-).


Given all the pros and cons, I think we should use option 4 above.

Do you see any better options?
_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to