Re: Perlstorm #0040

2000-09-27 Thread Ilya Zakharevich

==
 I lie: the other reason qr{} currently doesn't behave like that is
  that
 when we interpolate a compiled regexp into a context that requires
  it be
 recompiled,

Interpolated qr() items shouldn't be recompiled anyway.  They should
be treated as subroutine calls.  Unfortunately, this requires a
reentrant regex engine, which Perl doesn't have.  But I think it's the
right way to go, and it would solve the backreference problem, as well
as many other related problems.
==

The REx engine is reenterant enough right now.  All you need to do is
to add the //p switch (or, meanwhile, rewrite each $qrn into (?p{ $qrn })).

Ilya



Re: Perlstorm #0040

2000-09-24 Thread Richard Proctor

On Sun 24 Sep, Hugo wrote:
 In [EMAIL PROTECTED], Richard Proctor 
 writes
 :
 :TomCs perl storm has:
 :
 : Figure out way to do 
 : 
 : /$e1 $e2/
 : 
 : safely, where $e1 might have '(foo) \1' in it. 
 : and $e2 might have '(bar) \1' in it.  Those won't work.
 :
 :If e1 and e2 are qr// type things the answer might be to localise 
 :the backref numbers in each qr// expression.  
 :
 :If they are not qr//s it might still be possible to achieve if the
 :expansion of variables in regexes is done by the regex compiler it
 :could recognise this context and localise the backrefs.
 :
 :Any code like this is going to have real problem with $1 etc if used
 :later, use of assignment in a regex and named backrefs (RFC 112) would
 :make this a lot safer.
 
 I think it is reaonable to ask whether the current handling of qr{}
 subpatterns is correct:
 
 perl -wle '$a=qr/(a)\1/; $b=qr/(b).*\1/; /$a($b)/g and print join ":", $1,
 pos for "aabbac"' a:5
 
 I'm tempted to suggest it isn't; that the paren count should be local
 to each qr{}, so that the above prints 'bb:4'. I think that most people
 currently construct their qr{} patterns as if they are going to be
 handled in isolation, without regard to the context in which they are
 embedded - why else do they override the embedder's flags if not to
 achieve that?

This seams the right way to go

 The problem then becomes: do we provide a mechansim to access the
 nested backreferences outside of the qr{} in which they were referenced,
 and if so what syntax do we offer to achieve that? I don't have an answer
 to the latter, which tempts me to answer 'no' to the former for all the
 wrong reasons. I suspect (and suggest) that complication is the only
 reason we don't currently have the behaviour I suggest the rest of the
 semantics warrant - that backreferences are localised within a qr().

With the suggestions from RFC 112, with assignment within the regex and
named backreferences, this provides a solution for anyone trying to
get at a backref inside of a nested qr(), I think this is a reasonable way
forward.

 I lie: the other reason qr{} currently doesn't behave like that is that
 when we interpolate a compiled regexp into a context that requires it be
 recompiled, we currently ignore the compiled form and act only on the
 original string. Perhaps this is also an insufficiently intelligent thing
 to do.
 
 Hugo
 

Yes, this and MJDs comment about the reentrant regex engine.  I will stick
this in an RFC in a few minutes.

Richard

-- 

[EMAIL PROTECTED]




Perlstorm #0040

2000-09-23 Thread Richard Proctor

TomCs perl storm has:

 Figure out way to do 
 
 /$e1 $e2/
 
 safely, where $e1 might have '(foo) \1' in it. 
 and $e2 might have '(bar) \1' in it.  Those won't work.

If e1 and e2 are qr// type things the answer might be to localise 
the backref numbers in each qr// expression.  

If they are not qr//s it might still be possible to achieve if the expansion
of variables in regexes is done by the regex compiler it could recognise
this context and localise the backrefs.

Any code like this is going to have real problem with $1 etc if used later,
use of assignment in a regex and named backrefs (RFC 112) would make this
a lot safer.

Richard

-- 

[EMAIL PROTECTED]




Re: Perlstorm #0040

2000-09-23 Thread Hugo

In [EMAIL PROTECTED], Richard Proctor writes
:
:TomCs perl storm has:
:
: Figure out way to do 
: 
: /$e1 $e2/
: 
: safely, where $e1 might have '(foo) \1' in it. 
: and $e2 might have '(bar) \1' in it.  Those won't work.
:
:If e1 and e2 are qr// type things the answer might be to localise 
:the backref numbers in each qr// expression.  
:
:If they are not qr//s it might still be possible to achieve if the expansion
:of variables in regexes is done by the regex compiler it could recognise
:this context and localise the backrefs.
:
:Any code like this is going to have real problem with $1 etc if used later,
:use of assignment in a regex and named backrefs (RFC 112) would make this
:a lot safer.

I think it is reaonable to ask whether the current handling of qr{}
subpatterns is correct:

perl -wle '$a=qr/(a)\1/; $b=qr/(b).*\1/; /$a($b)/g and print join ":", $1, pos for 
"aabbac"'
a:5

I'm tempted to suggest it isn't; that the paren count should be local
to each qr{}, so that the above prints 'bb:4'. I think that most people
currently construct their qr{} patterns as if they are going to be
handled in isolation, without regard to the context in which they are
embedded - why else do they override the embedder's flags if not to
achieve that?

The problem then becomes: do we provide a mechansim to access the
nested backreferences outside of the qr{} in which they were referenced,
and if so what syntax do we offer to achieve that? I don't have an answer
to the latter, which tempts me to answer 'no' to the former for all the
wrong reasons. I suspect (and suggest) that complication is the only
reason we don't currently have the behaviour I suggest the rest of the
semantics warrant - that backreferences are localised within a qr().

I lie: the other reason qr{} currently doesn't behave like that is that
when we interpolate a compiled regexp into a context that requires it be
recompiled, we currently ignore the compiled form and act only on the
original string. Perhaps this is also an insufficiently intelligent thing
to do.

Hugo



Re: Perlstorm #0040

2000-09-23 Thread Mark-Jason Dominus


 I lie: the other reason qr{} currently doesn't behave like that is that
 when we interpolate a compiled regexp into a context that requires it be
 recompiled,

Interpolated qr() items shouldn't be recompiled anyway.  They should
be treated as subroutine calls.  Unfortunately, this requires a
reentrant regex engine, which Perl doesn't have.  But I think it's the
right way to go, and it would solve the backreference problem, as well
as many other related problems.