On 22/01/22 08:36, Alex Rousskov wrote:
TLDR: I am adding solution #6 into the mix based on Amos email (#5 was
taken by Eduard). Amos needs to clarify why he thinks that Squid master
branch cannot accept STL-based regexes "now". After that, we can decide
whether #6 remains a viable candidate. Details below.


On 1/21/22 12:42 PM, Amos Jeffries wrote:
On 20/01/22 10:32, Alex Rousskov wrote:
We have a use case where a regex in squid.conf should contain/match
a new line [...] This email discusses the problem and proposes how
to add a new line (and other special characters) to regexes found
in squid.conf and such.


With the current mix of squid.conf parsers this RFC seems irrelevant to me.

I do not understand the relationship between "the current mix of
squid.conf parsers" and this RFC relevance. This RFC is relevant because
it is about a practical solution to a real problem facing real Squid admins.


Sentence #2 of the RFC explicitly states that admin needs are not relevant "I do not know whether there are similar use
cases with the existing squid.conf regex directives"

The same sentence delimits RFC scope as: "adding a _new_ directive that will need such support."

That means the syntax defining how the regex pattern is configured does not yet exist. It is not necessary for the developer to design their _new_ UI syntax in a way that exposes admin to this problem in the first place. Simply design the



Whether Squid has one parser or ten, good ones or bad ones, is relevant
to how the solution is implemented/integrated with Squid, of course, but
that is already a part of the analysis on this thread.


Very relevant. RFC cites "squid.conf preprocessor and parameter parser use/strip all new lines" as a problem.

I point out that this behaviour depends on *which* config parser is chosen to be used by the (again _new_) directive. It should be an implementation detail for the dev, not design consideration for this RFC.



The developer designing a new directive also writes the parse_*()
function that processes the config file line. All they have to do is
avoid using the parser functions which implicitly do the problematic
behaviour.

Concerns regarding the overall quality of Squid configuration syntax and
upgrade paths expand the reach of this problem far beyond a single new
directive, but let's assume, for the sake of the argument, that all we
care about is a new parsing function. Now we need to decide what syntax
that parsing function will use. This RFC is about that decision.


Nod.

I must state that I do not see much in the say of squid.conf syntax discussion in the RFC text. It seems to focus a lot on syntax inside the regex pattern.

IMO regex is such a complicated situation that we should avoid having special things inside or on top of its syntax. That is a recipe for admin pain.


...
There was a plan from 2014 (re-attempted by Christos 2016) to migrate
Squid from the GNURegex dependency to more flexible C++11 regex library
which supports many regex languages. With that plan the UI would only
need an option flag or pattern prefix to specify which language a
pattern uses.

I agree that one of the solutions worth considering is to use a regex
library that supports different regex syntax. So here is the
corresponding entry for solution based on C++ STL regex:

6. Use STL regex features that support \n and similar escape sequences
Pros: Supports much more than just advanced escape sequences!
Pros: The new syntax is easy to document by referencing library docs.

Pro: we do not have to write any part of pattern matching ourselves. Simpler config parser.

Pro: we do not have to maintain custom code supporting special behaviours in regex pattern configuration.

Pro: we do not have to provide additional user support for non-standard squid.conf patterns.

Pro: we do not have to waste brain cycles designing how to integrate syntax into regex patterns cleanly.


Cons: Requires serious changes to the internal regex support in Squid.

IIRC, the changes are not as serious as it may seem. The largest part is squid.conf parser alteration to accept the proposals flag/prefix and patterns cleanly. Beyond that is just a switch of container which is easy (not trivial, just easy).


Cons: Miserable STL regex performance in some environments[1,2]?

IMO this is balanced by Squid existing regex being well known to have similar performance issues.


Cons: Converting old regexes requires (complex) automation.

Disagree this is problem.

GNU regex is predecessor syntax behind all modern regex variants. We can retain GNUregex as the default pattern and require language flag/prefix for patterns needing modern features.


Cons: Requires dropping GCC v4.8 support.
Cons: Amos thinks Squid cannot support STL regex until 2024.

I am honoured that you consider my opinion to be of such importance.

But, seriously, the technical part of my earlier statement is already covered by the GCC 4.8 line.


[2] STL does not allow us to define a custom allocator for its regexes.
Various STL implementations have various hidden workarounds, but we will
be at their (varying) mercy.


That is an interesting point. And probably should be a Con in its own right.



That plan was put on hold due to feature-incomplete GCC 4.8 versions
being distributed by CentOS 7 and RHEL needing to build Squid.

... and serious/substantiated performance concerns[1]. They may have
been addressed by STL implementations since then, but my quick check and
the impossibility of solving [2] without breaking ABI suggest that at
least some of these issues still remain.


One Core Developer (you Alex) has repeatedly expressed a strong opinion
veto'ing the addition/removal of features to Squid-6 while they are
still officially supported by a small set of "officially supported"
Vendors. RHEL and CentOS being in that set.

Sorry, I have no idea what you are talking about.


Your latest voicing of it was in <http://lists.squid-cache.org/pipermail/squid-dev/2021-December/009743.html>

> "
> Any
> known Squid regression affecting the "main" environment should block the
> PR introducing that regression IMO. I see no need to limit this to
> "build and unit tests" regressions
> "

The definition of "main" under discussion in that thread never reached consensus to change away from the existing OS represented by the Jenkins 5-pr-test nodes. So (for now) it still includes LTS versions of RHEL / CentOS 7 shipping the broken GCC 4.8.x std::regex.



Amos
_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to