Re: RFC 331 (v1) Consolidate the $1 and C<\1> notations

Dave Storrs Fri, 29 Sep 2000 12:34:35 -0700


On Thu, 28 Sep 2000, Hugo wrote:

> :=item *
> :/(foo)_C<\1>_bar/
> 
> Please don't do this: write C</(foo)_\1_bar/> or /(foo)_\1_bar/, but
> don't insert C<> in the middle: that makes it much more difficult to
> read.

        Sorry; that was a global-replace error that I missed on
proofreading.

 
> :mean different things:  the second will match 'foo_foo_bar', while the
> :first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
> 
> should be:         foo_[SOMETHING]_bar

        Um, yeah, it should...(jeez...I proofed this like three times,
honest!)  *blush*

 
> :captured in the B<previous> match...which could be a long, long way away,
> 
> This seems a bit unfair. It is just another variable. Any variable
> you include in a pattern, you are assumed to know that it contains
> the intended value - there is nothing special about $1 in this regard.

        Fair enough; the point I was trying to make was that \1 was
captured right here, while $1 was capturd long, long ago in a pattern
match far, far away. The visual/cognitive difference is small, but the
programming difference is huge.


> :=item *
> :${P1} means what $1 currently means (first match in last regex)
> 
> Do you understand that this is the same variable as $P1? Traditionally,
> perl very rarely coopts variable names that start with alphanumerics,
> and (off the top of my head) all the ones it does so coopt are letters
> only (ARGV, AUTOLOAD, STDOUT etc). I think we need better reasons to
> extend that to all $P1-style variables.

        I do understand that, and I agree with your concern.  Actually, I
didn't think that ${P1} was a particularly good notation even as I was
suggesting it...I just wanted to get the RFC up there before the deadline
so that people could discuss it.

        Having now thought about it more, I think that (?P1) is
better...in other words, make references to the previous pattern match be
a regex _extension_, not a core feature (if that's a valid way to phrase
the distinction).


> What is the migration path for existing uses of $P1-style variables?

        Wherever p526 sees a pattern that contains a $1, it should replace
it with (?P1).

 

> :=item *
> :s/(bar)(bell)/${P1}$2/               # changes "barbell" to "foobell"
> 
> Note that in the current regexp engine, ${P1} has disappeared by the
> time matching starts. Can you explain why we need to change this?
> Note also that if you are sticking with ${P1} either we need to
> rename all existing user variables of this form, or we can no longer
> use the existing 'interpolate this string' (or eval, double-eval etc)
> routines, and have to roll our own for this (these) as well.

        I'm a bit confused by the way this came out but, if I understand
what you're asking, then I believe your concerns are solved by the new
proposed syntax.  Am I right?


> :This may require significant changes to the regex engine, which is a topic
> :on which I am not qualified to speak.  Could someone with more
> :knowledge/experience please chime in?
> 
> Currently the regexp compiler is handed a string in which $variables
> have already interpolated. [...]

        I know there are certain exceptions to this...my Camel III says
(something to the effect of--I don't have it in front of me) "if there is
any doubt as to whether something should be interpolated or left for the
Engine, it will be left for the Engine."

        In any case, I don't think this needs to change.  I'm simply
changing what the names of the variables and backreferences are...\1
becomes (the new) $1, and (the current) $1 becomes (?P1)

> Changing the lifetime of backreferences feels likely to be difficult,
> but it isn't clear to me what you are trying to achieve here. I think
> you at least need to add an example of how it would act under s///g
> and s///ge.

        Good point.  I'll do that.

> :RFC 276: Localising Paren Counts in qr()s.
> 
> I didn't see a mention of these in the body of the proposal.

        276 is rather tangentially related, I grant.  However, I felt that
if my proposal went forward, it could impact on how 276 was implemented,
so I crossreferenced to it.

                                Dave
Re: RFC 331 (v1) Consolidate the $1 and C<\1> notations

Reply via email to