Perlstorm #0040
TomCs perl storm has: Figure out way to do /$e1 $e2/ safely, where $e1 might have '(foo) \1' in it. and $e2 might have '(bar) \1' in it. Those won't work. If e1 and e2 are qr// type things the answer might be to localise the backref numbers in each qr// expression. If they are not qr//s it might still be possible to achieve if the expansion of variables in regexes is done by the regex compiler it could recognise this context and localise the backrefs. Any code like this is going to have real problem with $1 etc if used later, use of assignment in a regex and named backrefs (RFC 112) would make this a lot safer. Richard -- [EMAIL PROTECTED]
Re: Perlstorm #0040
In [EMAIL PROTECTED], Richard Proctor writes : :TomCs perl storm has: : : Figure out way to do : : /$e1 $e2/ : : safely, where $e1 might have '(foo) \1' in it. : and $e2 might have '(bar) \1' in it. Those won't work. : :If e1 and e2 are qr// type things the answer might be to localise :the backref numbers in each qr// expression. : :If they are not qr//s it might still be possible to achieve if the expansion :of variables in regexes is done by the regex compiler it could recognise :this context and localise the backrefs. : :Any code like this is going to have real problem with $1 etc if used later, :use of assignment in a regex and named backrefs (RFC 112) would make this :a lot safer. I think it is reaonable to ask whether the current handling of qr{} subpatterns is correct: perl -wle '$a=qr/(a)\1/; $b=qr/(b).*\1/; /$a($b)/g and print join ":", $1, pos for "aabbac"' a:5 I'm tempted to suggest it isn't; that the paren count should be local to each qr{}, so that the above prints 'bb:4'. I think that most people currently construct their qr{} patterns as if they are going to be handled in isolation, without regard to the context in which they are embedded - why else do they override the embedder's flags if not to achieve that? The problem then becomes: do we provide a mechansim to access the nested backreferences outside of the qr{} in which they were referenced, and if so what syntax do we offer to achieve that? I don't have an answer to the latter, which tempts me to answer 'no' to the former for all the wrong reasons. I suspect (and suggest) that complication is the only reason we don't currently have the behaviour I suggest the rest of the semantics warrant - that backreferences are localised within a qr(). I lie: the other reason qr{} currently doesn't behave like that is that when we interpolate a compiled regexp into a context that requires it be recompiled, we currently ignore the compiled form and act only on the original string. Perhaps this is also an insufficiently intelligent thing to do. Hugo
Re: Perlstorm #0040
I lie: the other reason qr{} currently doesn't behave like that is that when we interpolate a compiled regexp into a context that requires it be recompiled, Interpolated qr() items shouldn't be recompiled anyway. They should be treated as subroutine calls. Unfortunately, this requires a reentrant regex engine, which Perl doesn't have. But I think it's the right way to go, and it would solve the backreference problem, as well as many other related problems.
RFC 112 (v3) Asignment within a regex
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Asignment within a regex =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 16 Aug 2000 Last Modified: 23 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 112 Version: 3 Status: Developing =head1 ABSTRACT Provide a simple way of naming and picking out information from a regex without having to count the brackets. =head1 DESCRIPTION If a regex is complex, counting the bracketed sub-expressions to find the ones you wish to pick out can be messy. It is also prone to maintainability problems if and when you wish to add to the expression. Using (?:) can be used to surpress picking up brackets, it helps, but it still gets "complex". I would sometimes rather just pickout the bits I want within the regex itself. Suggested syntax: (?$foo= ... ) would assign the string that is matched by the patten ... to $foo when the patten matches. These assignments would be made left to right after the match has succeded but before processing a replacement or other results (or prior to a some (?{...}) or (??{...}) code). There may be whitespace between the $foo and the "=". Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!, likewise the '=' could be any asignment operator. The camel and the docs include this example: if (/Time: (..):(..):(..)/) { $hours = $1; $minutes = $2; $seconds = $3; } This then becomes: /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ This is more maintainable than counting the brackets and easier to understand for a complex regex. And one does not have to worry about the scope of $1 etc. =head2 Named Backrefs The first versions of this RFC did not allow for backrefs. I now think this was a shortcoming. It can be done with (??{quotemeta $foo}), but I find this clumsy, a better way of using a named back ref might be (?\$foo). =head2 Scoping The question of scoping for these assignments has been raised, but I don't currently have a feel for the "best" way to handle this. Input welcome. =head2 Brackets Using this method for capturing wanted content, it might be desirable to stop ordinary brackets capturing, and needing to use (?:...). I therefore suggest that as an enhancement to regexes that /b (bracket?) ordinary brackets just group, without capture - in effect they all behave as (?:...). =head1 CHANGES V3 - added bit about backrefs, and brackets. =head1 IMPLENTATION Currently all $scalars in regexes are expanded before the main regex compiler gets to analyse the syntax. This problem also affects several other RFCs (166 for example). The expansion of variables in regexes needs for these (and other RFCs) to be driven from within the regex compiler so that the regex can expand as and where appropriate. Changing this should not affect any existing behaviour. =head1 REFERENCES I brought this up on p5p a couple of years ago, but it was lost in the noise... RFC 166: Alternative lists and quoting of things Perlstorm #0040