Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-29 Thread Piers Cawley

Jonathan Scott Duff [EMAIL PROTECTED] writes:

 On Thu, Sep 28, 2000 at 08:57:39PM -, Perl6 RFC Librarian wrote:
  ${P1} means what $1 currently means (first match in last regex)
 
 I'm sorry that I don't have anything more constructive to say than
 "ick", but ... Ick.

I'm with the 'Ick' camp too. And possibly with the 'Leave it the hell
alone! If you're that bloody stupid you deserve to lose' camp too.

-- 
Piers




Re: RFC 112 (v3) Asignment within a regex

2000-09-29 Thread Richard Proctor




 On Fri, 29 Sep 2000 01:02:40 +0100, Hugo wrote:

 It also isn't clear what parts of the expression are interpolated at
 compile time; what should the following leave in %foo?
 
   %foo = ();
   $bar = "one";
   "twothree" =~ / (?$bar=two) (?$foo{$bar}=three) /x;

 It's not just that. You act as if this is assignment takes place
 whenever a submatch succeeds. So:

  "twofour" =~ /(?$bar=two)($foo=three)/;

 Will $bar be set to "two", and $foo undef? I think not. Assignment
 should be postponed to till the very end, when the match finally
 succeeds, as a whole.

In general all assignments should wait to the very end, and then assign
them all.  However before code callouts (?{...}) and enemies, the named
assignments that are currently defined should be made (localised) so that
the code can refer to them by name.  If the expression finally fails the
localised values would unroll.


 Therefore, I think that allowing just any l-value on the left of the "="
 sign, is not practical. Or is it?

I think any simple scalar value is reasonable.


 OTOH I would rather have that all submatches would be assigned to a
 hash, not to global or lexical variables. I have no clue about what
 syntax that would need.

That is in RFC 150, I think there is a case for both.

Richard





Re: RFC 112 (v3) Asignment within a regex

2000-09-29 Thread Hugo

In [EMAIL PROTECTED], "Richard Proctor" writes:
:In general all assignments should wait to the very end, and then assign
:them all. [...] If the expression finally fails the localised values
:would unroll.

Ah, I hadn't anticipated that - I had assumed you would get whatever
was the last value set. Please can you make sure this is clearly
explained in the next version of the RFC?

Hugo



Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching

2000-09-29 Thread Bart Lateur

On Fri, 29 Sep 2000 13:19:47 +0100, Hugo wrote:

I think that involves
rewriting your /p example something like:
  if (/^$pat$/z) {
print "found a complete match";
  } elsif (defined pos) {
print "found a prefix match";
  } else {
print "not a match";
  }

Except that this isn't exactly what would happen. Look, "1234E+2" is a
complete string matching the regex, but it could be that it's just a
prefix for "1234E+21". So, /^$pat$/z should fail. No? This doesn't seem
too intuitive, but that's a result from a minimal interface.

-- 
Bart.



Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-29 Thread Dave Storrs



On Thu, 28 Sep 2000, Hugo wrote:

 :=item *
 :/(foo)_C\1_bar/
 
 Please don't do this: write C/(foo)_\1_bar/ or /(foo)_\1_bar/, but
 don't insert C in the middle: that makes it much more difficult to
 read.

Sorry; that was a global-replace error that I missed on
proofreading.

 
 :mean different things:  the second will match 'foo_foo_bar', while the
 :first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
 
 should be: foo_[SOMETHING]_bar

Um, yeah, it should...(jeez...I proofed this like three times,
honest!)  *blush*

 
 :captured in the Bprevious match...which could be a long, long way away,
 
 This seems a bit unfair. It is just another variable. Any variable
 you include in a pattern, you are assumed to know that it contains
 the intended value - there is nothing special about $1 in this regard.

Fair enough; the point I was trying to make was that \1 was
captured right here, while $1 was capturd long, long ago in a pattern
match far, far away. The visual/cognitive difference is small, but the
programming difference is huge.


 :=item *
 :${P1} means what $1 currently means (first match in last regex)
 
 Do you understand that this is the same variable as $P1? Traditionally,
 perl very rarely coopts variable names that start with alphanumerics,
 and (off the top of my head) all the ones it does so coopt are letters
 only (ARGV, AUTOLOAD, STDOUT etc). I think we need better reasons to
 extend that to all $P1-style variables.

I do understand that, and I agree with your concern.  Actually, I
didn't think that ${P1} was a particularly good notation even as I was
suggesting it...I just wanted to get the RFC up there before the deadline
so that people could discuss it.

Having now thought about it more, I think that (?P1) is
better...in other words, make references to the previous pattern match be
a regex _extension_, not a core feature (if that's a valid way to phrase
the distinction).


 What is the migration path for existing uses of $P1-style variables?

Wherever p526 sees a pattern that contains a $1, it should replace
it with (?P1).

 

 :=item *
 :s/(bar)(bell)/${P1}$2/   # changes "barbell" to "foobell"
 
 Note that in the current regexp engine, ${P1} has disappeared by the
 time matching starts. Can you explain why we need to change this?
 Note also that if you are sticking with ${P1} either we need to
 rename all existing user variables of this form, or we can no longer
 use the existing 'interpolate this string' (or eval, double-eval etc)
 routines, and have to roll our own for this (these) as well.

I'm a bit confused by the way this came out but, if I understand
what you're asking, then I believe your concerns are solved by the new
proposed syntax.  Am I right?


 :This may require significant changes to the regex engine, which is a topic
 :on which I am not qualified to speak.  Could someone with more
 :knowledge/experience please chime in?
 
 Currently the regexp compiler is handed a string in which $variables
 have already interpolated. [...]

I know there are certain exceptions to this...my Camel III says
(something to the effect of--I don't have it in front of me) "if there is
any doubt as to whether something should be interpolated or left for the
Engine, it will be left for the Engine."

In any case, I don't think this needs to change.  I'm simply
changing what the names of the variables and backreferences are...\1
becomes (the new) $1, and (the current) $1 becomes (?P1)

 Changing the lifetime of backreferences feels likely to be difficult,
 but it isn't clear to me what you are trying to achieve here. I think
 you at least need to add an example of how it would act under s///g
 and s///ge.

Good point.  I'll do that.

 :RFC 276: Localising Paren Counts in qr()s.
 
 I didn't see a mention of these in the body of the proposal.

276 is rather tangentially related, I grant.  However, I felt that
if my proposal went forward, it could impact on how 276 was implemented,
so I crossreferenced to it.

Dave 




Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-29 Thread Dave Storrs



On Fri, 29 Sep 2000, Hildo Biersma wrote:

  Currently, C\1 and $1 have only slightly different meanings within a
  regex.  Let's consolidate them together, eliminate the differences, and
  settle on $1 as the standard.
 
 Sigh.  That would remove functionality from the language.
 
 The reason why you need \1 in a regular expression is that $1, $2, ...
 are interpolated from the previous regular expression.  This allows me
 to do a pattern match that captures variables, then use the results of
 that to create a second regular expression. (Remember: A regexp
 interpolates first, then compiles the pattern).


Umm...with all due respect, did you read the RFC?  Because what I
proposed does not eliminate any functionality.  

Dave




Re: RFC 348 (v1) Regex assertions in plain Perl code

2000-09-29 Thread Hugo

In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
:=item assertion in Perl5
:
: (?(?{not COND})(?!))
: (?(?{not do { COND }})(?!))

Or (?(?{COND})|(?!)).

Migration could consider replacing detectable equivalents of such
constructs with the favoured new construct.

:"local" inside embedded code will no longer be supported, nor will
:consitional regexes. The Perl5 - Perl6 translator should warn if it
:ever encounters one of these.

I'm not convinced that removing either of these are necessary to the
main thrust of the proposal. They may both still be useful in their
own right, and you seem to offer little evidence against them other
than that you don't like them.

I do like the idea of making (?{...}) an assertion, all the more
because we have a simple migration path that avoids unnecessarily
breaking existing scripts: wrap $code as '$^R = do { $code }; 1'.

If you want to remove support for 'local' in embedded code, it is
worth a full proposal in its own right that will explain what will
happen if people try to do that. (I think it will make perl
unnecessarily more complex to detect and disable it in this case.)
Similarly if you want to remove support for (?(...)) completely,
you need to address the utility and options for migration for all
the available uses of it, not just the one addressed by the new
handling of (?{...}).

Hugo