On 27/09/2013 5:35 a.m., Alex Rousskov wrote:
On 09/26/2013 10:02 AM, Amos Jeffries wrote:
Last I saw on the new strict configuration issues was that Alex was
requesting final resolution of the squid.conf syntax for regex pattern
tokens in strict parse mode before it goes to 3.4.
I did not request that the RE resolution is made before the code goes
into v3.4. IIRC, the committed changes essentially disable RE support in
strict mode. That should not introduce backward compatibility problems
AFAICT.
What I did consider important is that the "foo=bar" decision is made
before the committed changes go into v3.4. Here is a quote from my
2013/08/28 email (I assume that is the email you refer to above):
As I wrote earlier, the 'foo="bar and baz"' issue worries me, but I
think we can discuss that after your commit. The important part is for
Amos not to pull your changes into v3.4 until that discussion is over.
The reason I wanted us to reach a decision is to avoid telling v3.4
users that the syntax has changed yet again. However, very few users are
going to start using the bad
"foo=bar and baz"
instead of the natural
foo="bar and baz"
syntax so we can probably go ahead with the pull if needed.
Aha. Okay I misremembered. That is better state then I was thinking.
Yes I agree that very few (or none) are going to use the "foo= bar"
style in what will hopefully been another shortish lifecycle.
At this point I am very much in favour of keeping the foo= syntax on
grounds of it being so familiar and well published that removing it will
be a major amount of pain to a lot of people.
That objection is
essentially blocking 3.4.0.2 release which requires several of the other
fixes in the patch.
There are other reasons to worry about that parsing change, but I do not
think RE support is holding us hostage here. I hope the above clarifies.
As for "other reasons", see below and my next email.
Christos, Do you have any unseen progress on that last remaining piece
of the new parser?
Christos has made a lot of parsing improvements since the last commit.
However, I think we need to re-evaluate our overall approach to this
problem. I told Christos as much yesterday, but he did not have a chance
to respond yet. I will forward my email to him here although it is a bit
rough. Christos, if you are reading this, please feel free to comment
here instead of responding to my private email.
IMO;
I have kind of been favouring regex( some (pattern) ) since we have
now added function(...) style to squid.conf for parameters(things). Note
that brackets can be easily counted to skip the patterns internal ( and
) groupings and \( \) literals, leaving an easily identifiable
terminator character for regex(...)some_garbage_token .
I share the "regex" prefix direction (see my next email) but since
parenthesis are used extensively in REs, I do not think they are a good
default. It would be very difficult for admins to understand correctly
which parenthesis they need to escape inside the RE and how.
As I said the bracket counting is very easy to do. We already have the
guarantee from regex syntax itself that all non-escaped ( and ) are
going to be paired. We just need to absorb the initial '(' from "regex("
and count scopes+= for heach ( and scopes-- for each ) until we hit )
with scopes=0. the middle bit is guaranteed to be pattern string, so
drop the trailing ')' and return from regex tokenize step.
I am objecting to the suggested use of // on grounds that it is too
easily confused with perl regex s/pattern/g syntax and if admin start
entering patterns from that regex language syntax into squid.conf
GNU-regex parser undefined problems will arise in ways hard to debug.
However we can always add preg(/pattern/) in future when we add support
for that expression type.
What say you?
AFAIK, the /re/ syntax is used by sed, PHP, Javascript, Ruby, and
probably many other tools and languages. It is not specific to Perl and
predates Perl. The /re/ syntax tells admins that they are looking at
some RE, not that they are looking at a Perl (or any other specific
flavor of) RE.
The only usage I've seen it in is the Perl regex and tools like the ones
you mention above which share that syntax. Whichever ones came first is
not much matter Perl is the donkey that carried that regex syntax into
my life and a great deal of other admin as well.
The main point is that Squid GNU-based pettern syntax is notably
different in a number of edge case (omissions mainly) which will trip
people up if they confuse the two. Lets not invite that confusion in the
new-and-improved parser.
The only real problem with /re/ syntax as the default is that it does
not work well with URLs, which are very common in Squid patterns. That
is why I think a string-based "re" may be a better default for Squid.
Which menas that is make escaping mandatory in one form or another.
Which is giant leap #1 down the slipery slope towards
"/http:\\\/\\\/foo\\/i broke
it\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/?how/"
With string based or any other delimiter (including '/') we cannot
differentiate the pattern token from the delimiter token without
escaping the pattern token, then any escape-characters in the pattern as
well. Given your code expertise you have possibly read the same or
similar language design document I did about this problem.
Using () brackets or [] brackets we get that nice pairing guarantee
from regex (in all the flavours I'm aware of) and can apply the above
mentioned algorithm without any escaping necessary at the squid.conf
level. Regex may require escaping of some ( and ) itself but that is
more easily done without any squid escapes getting in the way.
However, it is not urgent to decide this now if my understanding about
RE support in the committed code is correct. The concerns I will
highlight in my next email are far more important because they affect
strict syntax adoption and a lot of code (so if you pull the committed
changes into v3.4 now, we may end up with three rather different code
bases to work with: old, v3.4, and trunk).
Okay. Will wait for that before making a decision.
Amos